refactor: extract unified graph model subsystem (ROADMAP 3.11) by carlos-alm · Pull Request #435 · optave/codegraph

carlos-alm · 2026-03-13T21:01:53Z

Summary

Extract shared src/graph/ subsystem that centralizes duplicated graph construction across cycles.js, communities.js, viewer.js, structure.js, and triage.js
CodeGraph class with nodes, edges, adjacency, filtering, subgraph, and graphology conversion
Algorithms: Tarjan SCC, Louvain communities, BFS, shortest path, fan-in/fan-out centrality
Classifiers: role classification (pure logic from structure.js), risk scoring (pure logic from triage.js)
Builders: buildDependencyGraph, buildStructureGraph, buildTemporalGraph (DB → CodeGraph)
All 5 consumer modules refactored to delegate to graph subsystem; all public API signatures unchanged

Structure

src/graph/
  model.js              # CodeGraph class
  index.js              # Barrel export
  algorithms/
    tarjan.js           # SCC cycle detection
    louvain.js          # Community detection
    bfs.js              # Breadth-first traversal
    shortest-path.js    # BFS-based path finding
    centrality.js       # Fan-in/fan-out
  classifiers/
    roles.js            # Node role classification
    risk.js             # Risk scoring
  builders/
    dependency.js       # DB → dependency graph
    structure.js        # DB → containment graph
    temporal.js         # DB → co-change graph

Test plan

66 new unit tests across model, algorithms, classifiers, builders
All 1670 existing tests pass (96 test files, 0 failures)
Lint clean (biome check — 0 errors)
CLI commands verified: build, cycles, communities, triage, structure

Extract shared src/graph/ subsystem from duplicated graph construction across cycles.js, communities.js, viewer.js, structure.js, and triage.js. - CodeGraph class: nodes, edges, adjacency, filtering, graphology conversion - Algorithms: Tarjan SCC, Louvain communities, BFS, shortest path, centrality - Classifiers: role classification (from structure.js), risk scoring (from triage.js) - Builders: dependency graph, structure graph, temporal graph (from DB) - All consumers refactored to use graph subsystem; public APIs unchanged - 66 new unit tests, all 1670 existing tests still pass

claude · 2026-03-13T21:02:06Z

Claude finished @carlos-alm's task —— View job

Reviewing PR: Unified Graph Model Subsystem

Conducted comprehensive review of this large refactoring PR (32 files, 1613+ additions).

Root Cause Analysis

Problem Being Solved: Eliminating code duplication across 5 modules (cycles.js, communities.js, viewer.js, structure.js, triage.js) that were each implementing their own graph construction and algorithms. This is a legitimate architectural improvement addressing technical debt from ROADMAP 3.11.

Why This Approach: Creates a centralized CodeGraph class with shared algorithms (Tarjan SCC, Louvain, BFS, etc.) and builders that convert DB data to graph structures. The delegation approach maintains API compatibility while consolidating logic.

Risk Assessment: Low risk - this is pure refactoring with no behavior changes claimed and extensive test coverage.

Backlog Compliance

Zero-dep: ✓ — Uses existing graphology dependency, no new runtime dependencies added
Foundation-aligned: ✓ — Pure computational refactoring, maintains embeddable-first design, no cloud dependencies
Problem-fit: 4 — Improves maintainability and reduces code duplication, indirectly helps agents by consolidating graph logic
Breaking: No — All public API signatures preserved, purely additive changes
Tier: 1

Critical Concerns

None identified. This is a well-executed refactoring with:

✅ Clean Architecture: Logical subsystem organization with proper separation of concerns
✅ Comprehensive Testing: 66 new unit tests covering all components
✅ API Compatibility: All existing public interfaces preserved
✅ Code Quality: No debugging artifacts, proper documentation, consistent patterns
✅ Zero Configuration Changes: No linting rules relaxed or quality standards lowered

Minor Observations

Excellent test coverage across the new subsystem
Clean delegation pattern in consumer modules preserves existing behavior
Proper resource management in builders with database connections
Documentation is thorough with JSDoc annotations

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: This is a high-quality refactoring that eliminates technical debt while maintaining all existing functionality. The extensive test suite and clean architecture demonstrate careful execution. The consolidation will make future graph-related features easier to implement and maintain.

greptile-apps · 2026-03-13T21:07:40Z

Greptile Summary

This PR extracts a well-designed src/graph/ subsystem that centralises previously duplicated graph-construction logic spread across five consumer modules (cycles.js, communities.js, viewer.js, structure.js, triage.js). The CodeGraph class, algorithm and classifier utilities, and three DB builders are all sound, and all three previously-flagged issues (7-kind regression in getCallableNodes, edgeCount double-counting for undirected graphs, and duplicate toGraphology branches) have been correctly addressed.

Key observations:

The CORE_SYMBOL_KINDS-based SQL generation in graph-read.js is a solid future-proofing fix — adding a new kind to kinds.js will automatically expand the callable-nodes query.
All five consumer modules preserve their public API contracts exactly; the refactor is transparent to callers.
66 new unit tests cover the graph model, all algorithms, all classifiers, and all builders at good depth.
Minor: the JSDoc block in src/graph/algorithms/louvain.js is placed before the import statement instead of directly above export function louvainCommunities, which means documentation tools and editor hover features will not associate the comment with the function (see inline comment).
Minor: in src/viewer.js, the fnGraph construction appears under the "Fan-in / fan-out via graph subsystem" heading, but the actual fan-in/fan-out is computed via direct DB queries immediately after; fnGraph is only consumed by the Louvain section ~20 lines below (see inline comment).

Confidence Score: 4/5

This PR is safe to merge; all public API signatures are preserved and all 1670 existing tests pass alongside 66 new ones.
Large, well-structured refactor with thorough test coverage, all three prior review issues resolved, and no runtime-breaking changes identified. Score is 4 rather than 5 solely because of two minor code-clarity issues (JSDoc misplacement in louvain.js and misleading section comment in viewer.js) that have no functional impact but could mislead future contributors.
src/graph/algorithms/louvain.js (JSDoc placement) and src/viewer.js (fnGraph section comment)

Important Files Changed

Filename	Overview
src/graph/model.js	Core `CodeGraph` class: directed/undirected support, correct `edgeCount` fix (÷2 for undirected), clean adjacency maps, proper deduplication in `edges()` iterator, and sound `toGraphology` conversion. Previously flagged issues resolved.
src/graph/algorithms/louvain.js	Correctly wraps graphology-louvain with a typed `Map`-based return. Minor issue: the module JSDoc is placed before the `import` statement rather than directly before the `export function`, causing documentation tools and editor hover to miss the JSDoc.
src/graph/builders/dependency.js	Centralises file-level and function-level graph construction. The `CORE_SYMBOL_KINDS` fix in `getCallableNodes` (now all 10 symbol kinds) correctly addresses the prior regression for `buildFunctionLevelGraph`.
src/viewer.js	Successfully migrates Louvain community detection to the new graph subsystem. Minor code-organisation issue: `fnGraph` is constructed under the "Fan-in / fan-out" comment but is only consumed 20 lines later in the Louvain section.
src/communities.js	Clean refactor to `buildDependencyGraph` + `louvainCommunities`. Correctly guards empty-graph case before calling Louvain. The `assignments.get(key) == null` guard handles any isolated nodes not included in Louvain output.
src/cycles.js	Cleanly delegates to `buildDependencyGraph` + `tarjan`. The label-map strategy correctly preserves human-readable IDs for native Rust interop. `findCyclesJS` retained as a backward-compatible public export.
src/db/repository/graph-read.js	Fixes the `getCallableNodes` query to use all 10 `CORE_SYMBOL_KINDS` (generated from `kinds.js`) instead of the hardcoded 3, resolving the prior regression for function-level cycle detection.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph consumers["Consumer Modules"]
        CY["cycles.js"]
        CM["communities.js"]
        VI["viewer.js"]
        ST["structure.js"]
        TR["triage.js"]
    end

    subgraph graph["src/graph/ subsystem"]
        subgraph builders["builders/"]
            BD["dependency.js\nbuildDependencyGraph"]
            BS["structure.js\nbuildStructureGraph"]
            BT["temporal.js\nbuildTemporalGraph"]
        end

        subgraph model["model.js"]
            CG["CodeGraph\n(directed / undirected)"]
        end

        subgraph algorithms["algorithms/"]
            TA["tarjan.js"]
            LO["louvain.js"]
            BF["bfs.js"]
            SP["shortest-path.js"]
            CE["centrality.js"]
        end

        subgraph classifiers["classifiers/"]
            RO["roles.js\nclassifyRoles"]
            RI["risk.js\nscoreRisk"]
        end
    end

    subgraph db["SQLite DB"]
        DB[("graph.db\nnodes / edges\nco_changes")]
    end

    CY -->|buildDependencyGraph| BD
    CM -->|buildDependencyGraph| BD
    VI -->|CodeGraph| CG
    VI -->|louvainCommunities| LO
    ST -->|classifyRoles| RO
    TR -->|scoreRisk| RI

    BD --> CG
    BS --> CG
    BT --> CG

    CY -->|tarjan| TA
    CM -->|louvainCommunities| LO

    BD --> DB
    BS --> DB
    BT --> DB

    TA --> CG
    LO --> CG
    BF --> CG
    SP --> CG
    CE --> CG

_{Last reviewed commit: 5d3a71b}

greptile-apps · 2026-03-13T21:07:43Z

src/cycles.js

-        const targetFile = e.target.split('|').pop();
-        return !isTestFile(sourceFile) && !isTestFile(targetFile);
-      });
+  const graph = buildDependencyGraph(db, { fileLevel, noTests });


Function-level cycle detection silently drops 7 node kinds

The old findCycles function-level query explicitly filtered for 10 node kinds:

WHERE n1.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module')

The new code delegates to buildDependencyGraph(..., { fileLevel: false }), which calls buildFunctionLevelGraph, which in turn calls getCallableNodes(db). That function only returns nodes matching kind IN ('function','method','class') — omitting interface, type, struct, enum, trait, record, and module.

As a result, any cycle that passes through one of those 7 kinds will now be silently missed by the JS fallback path. The native Rust path receives the same narrowed edge list, so it is affected equally.

This is a behavioral regression for the function-level (--functions) cycle detection command on codebases that use TypeScript interfaces, enums, Go structs, Rust traits, etc. involved in circular calls.

Fixed in 5d3a71b. getCallableNodes now queries all 10 CORE_SYMBOL_KINDS instead of the hardcoded 3. The SQL is generated from the CORE_SYMBOL_KINDS array in kinds.js, so it stays in sync automatically if kinds are added in the future.

greptile-apps · 2026-03-13T21:07:44Z

src/graph/model.js

+  get edgeCount() {
+    let count = 0;
+    for (const targets of this._successors.values()) count += targets.size;
+    return count;
+  }


edgeCount double-counts edges for undirected graphs

For an undirected CodeGraph, addEdge(a, b) stores the edge in both _successors (a→b and b→a). The edgeCount getter sums all entries across _successors, so it returns 2× the logical edge count for undirected graphs. The test for this case confirms it:

expect(g.edgeCount).toBe(4); // 2 directed entries per undirected edge

This is semantically inconsistent with edges(), which correctly deduplicates and returns the actual edge count. Any caller that uses edgeCount to reason about the size of an undirected graph (e.g. buildTemporalGraph produces an undirected graph; passing it to louvainCommunities skips Louvain if edgeCount === 0, but the empty-check happens inside toGraphology via gy.size, so no present-day crash occurs) will get a value 2× too large.

Consider delegating edgeCount to [...this.edges()].length (O(edges), but correct), or tracking a separate _edgeCount counter that is incremented only once per logical edge:

get edgeCount() { let count = 0; for (const _ of this.edges()) count++; return count; }

Fixed in 5d3a71b. edgeCount now divides by 2 for undirected graphs so it returns the logical edge count, consistent with edges() iterator. Test updated accordingly.

greptile-apps · 2026-03-13T21:07:45Z

src/graph/model.js

+    if (type === 'undirected') {
+      // Deduplicate: only add each unordered pair once
+      for (const [src, tgt] of this.edges()) {
+        if (src === tgt) continue;
+        if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
+      }
+    } else {
+      for (const [src, tgt] of this.edges()) {
+        if (src === tgt) continue;
+        if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
+      }
+    }


Duplicate if/else branches in toGraphology

Both the if (type === 'undirected') branch and the else branch contain identical code. The conditional adds no value and the deduplication comment (// Deduplicate: only add each unordered pair once) is misleading because both branches behave identically. The two blocks can be collapsed:

Suggested change

if (type === 'undirected') {

// Deduplicate: only add each unordered pair once

for (const [src, tgt] of this.edges()) {

if (src === tgt) continue;

if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);

}

} else {

for (const [src, tgt] of this.edges()) {

if (src === tgt) continue;

if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);

}

}

for (const [src, tgt] of this.edges()) {

if (src === tgt) continue;

if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);

}

Fixed in 5d3a71b. Collapsed the duplicate branches into a single loop as suggested.

- getCallableNodes now queries all 10 CORE_SYMBOL_KINDS, not just 3 (fixes function-level cycle detection regression for interfaces, structs, enums, traits, records, modules, types) - edgeCount returns logical edge count for undirected graphs (was 2x) - Remove duplicate branches in toGraphology

carlos-alm · 2026-03-13T21:22:05Z

@greptileai

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

carlos-alm merged commit 04bce00 into main Mar 13, 2026
26 checks passed

carlos-alm deleted the feat/unified-graph-model branch March 13, 2026 21:42

github-actions bot locked and limited conversation to collaborators Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: extract unified graph model subsystem (ROADMAP 3.11)#435

refactor: extract unified graph model subsystem (ROADMAP 3.11)#435
carlos-alm merged 2 commits intomainfrom
feat/unified-graph-model

carlos-alm commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 13, 2026

Uh oh!

carlos-alm Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Uh oh!

carlos-alm Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Uh oh!

carlos-alm Mar 13, 2026

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 13, 2026

Summary

Structure

Test plan

Uh oh!

claude bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR: Unified Graph Model Subsystem

Root Cause Analysis

Backlog Compliance

Critical Concerns

Minor Observations

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 13, 2026 •

edited

Loading

greptile-apps bot commented Mar 13, 2026 •

edited

Loading