Skip to content

refactor: extract unified graph model subsystem (ROADMAP 3.11)#435

Merged
carlos-alm merged 2 commits intomainfrom
feat/unified-graph-model
Mar 13, 2026
Merged

refactor: extract unified graph model subsystem (ROADMAP 3.11)#435
carlos-alm merged 2 commits intomainfrom
feat/unified-graph-model

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Extract shared src/graph/ subsystem that centralizes duplicated graph construction across cycles.js, communities.js, viewer.js, structure.js, and triage.js
  • CodeGraph class with nodes, edges, adjacency, filtering, subgraph, and graphology conversion
  • Algorithms: Tarjan SCC, Louvain communities, BFS, shortest path, fan-in/fan-out centrality
  • Classifiers: role classification (pure logic from structure.js), risk scoring (pure logic from triage.js)
  • Builders: buildDependencyGraph, buildStructureGraph, buildTemporalGraph (DB → CodeGraph)
  • All 5 consumer modules refactored to delegate to graph subsystem; all public API signatures unchanged

Structure

src/graph/
  model.js              # CodeGraph class
  index.js              # Barrel export
  algorithms/
    tarjan.js           # SCC cycle detection
    louvain.js          # Community detection
    bfs.js              # Breadth-first traversal
    shortest-path.js    # BFS-based path finding
    centrality.js       # Fan-in/fan-out
  classifiers/
    roles.js            # Node role classification
    risk.js             # Risk scoring
  builders/
    dependency.js       # DB → dependency graph
    structure.js        # DB → containment graph
    temporal.js         # DB → co-change graph

Test plan

  • 66 new unit tests across model, algorithms, classifiers, builders
  • All 1670 existing tests pass (96 test files, 0 failures)
  • Lint clean (biome check — 0 errors)
  • CLI commands verified: build, cycles, communities, triage, structure

Extract shared src/graph/ subsystem from duplicated graph construction
across cycles.js, communities.js, viewer.js, structure.js, and triage.js.

- CodeGraph class: nodes, edges, adjacency, filtering, graphology conversion
- Algorithms: Tarjan SCC, Louvain communities, BFS, shortest path, centrality
- Classifiers: role classification (from structure.js), risk scoring (from triage.js)
- Builders: dependency graph, structure graph, temporal graph (from DB)
- All consumers refactored to use graph subsystem; public APIs unchanged
- 66 new unit tests, all 1670 existing tests still pass
@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: Unified Graph Model Subsystem

Conducted comprehensive review of this large refactoring PR (32 files, 1613+ additions).

Root Cause Analysis

Problem Being Solved: Eliminating code duplication across 5 modules (cycles.js, communities.js, viewer.js, structure.js, triage.js) that were each implementing their own graph construction and algorithms. This is a legitimate architectural improvement addressing technical debt from ROADMAP 3.11.

Why This Approach: Creates a centralized CodeGraph class with shared algorithms (Tarjan SCC, Louvain, BFS, etc.) and builders that convert DB data to graph structures. The delegation approach maintains API compatibility while consolidating logic.

Risk Assessment: Low risk - this is pure refactoring with no behavior changes claimed and extensive test coverage.

Backlog Compliance

  • Zero-dep: ✓ — Uses existing graphology dependency, no new runtime dependencies added
  • Foundation-aligned: ✓ — Pure computational refactoring, maintains embeddable-first design, no cloud dependencies
  • Problem-fit: 4 — Improves maintainability and reduces code duplication, indirectly helps agents by consolidating graph logic
  • Breaking: No — All public API signatures preserved, purely additive changes
  • Tier: 1

Critical Concerns

None identified. This is a well-executed refactoring with:

✅ Clean Architecture: Logical subsystem organization with proper separation of concerns
✅ Comprehensive Testing: 66 new unit tests covering all components
✅ API Compatibility: All existing public interfaces preserved
✅ Code Quality: No debugging artifacts, proper documentation, consistent patterns
✅ Zero Configuration Changes: No linting rules relaxed or quality standards lowered

Minor Observations

  • Excellent test coverage across the new subsystem
  • Clean delegation pattern in consumer modules preserves existing behavior
  • Proper resource management in builders with database connections
  • Documentation is thorough with JSDoc annotations

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a high-quality refactoring that eliminates technical debt while maintaining all existing functionality. The extensive test suite and clean architecture demonstrate careful execution. The consolidation will make future graph-related features easier to implement and maintain.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR extracts a well-designed src/graph/ subsystem that centralises previously duplicated graph-construction logic spread across five consumer modules (cycles.js, communities.js, viewer.js, structure.js, triage.js). The CodeGraph class, algorithm and classifier utilities, and three DB builders are all sound, and all three previously-flagged issues (7-kind regression in getCallableNodes, edgeCount double-counting for undirected graphs, and duplicate toGraphology branches) have been correctly addressed.

Key observations:

  • The CORE_SYMBOL_KINDS-based SQL generation in graph-read.js is a solid future-proofing fix — adding a new kind to kinds.js will automatically expand the callable-nodes query.
  • All five consumer modules preserve their public API contracts exactly; the refactor is transparent to callers.
  • 66 new unit tests cover the graph model, all algorithms, all classifiers, and all builders at good depth.
  • Minor: the JSDoc block in src/graph/algorithms/louvain.js is placed before the import statement instead of directly above export function louvainCommunities, which means documentation tools and editor hover features will not associate the comment with the function (see inline comment).
  • Minor: in src/viewer.js, the fnGraph construction appears under the "Fan-in / fan-out via graph subsystem" heading, but the actual fan-in/fan-out is computed via direct DB queries immediately after; fnGraph is only consumed by the Louvain section ~20 lines below (see inline comment).

Confidence Score: 4/5

  • This PR is safe to merge; all public API signatures are preserved and all 1670 existing tests pass alongside 66 new ones.
  • Large, well-structured refactor with thorough test coverage, all three prior review issues resolved, and no runtime-breaking changes identified. Score is 4 rather than 5 solely because of two minor code-clarity issues (JSDoc misplacement in louvain.js and misleading section comment in viewer.js) that have no functional impact but could mislead future contributors.
  • src/graph/algorithms/louvain.js (JSDoc placement) and src/viewer.js (fnGraph section comment)

Important Files Changed

Filename Overview
src/graph/model.js Core CodeGraph class: directed/undirected support, correct edgeCount fix (÷2 for undirected), clean adjacency maps, proper deduplication in edges() iterator, and sound toGraphology conversion. Previously flagged issues resolved.
src/graph/algorithms/louvain.js Correctly wraps graphology-louvain with a typed Map-based return. Minor issue: the module JSDoc is placed before the import statement rather than directly before the export function, causing documentation tools and editor hover to miss the JSDoc.
src/graph/builders/dependency.js Centralises file-level and function-level graph construction. The CORE_SYMBOL_KINDS fix in getCallableNodes (now all 10 symbol kinds) correctly addresses the prior regression for buildFunctionLevelGraph.
src/viewer.js Successfully migrates Louvain community detection to the new graph subsystem. Minor code-organisation issue: fnGraph is constructed under the "Fan-in / fan-out" comment but is only consumed 20 lines later in the Louvain section.
src/communities.js Clean refactor to buildDependencyGraph + louvainCommunities. Correctly guards empty-graph case before calling Louvain. The assignments.get(key) == null guard handles any isolated nodes not included in Louvain output.
src/cycles.js Cleanly delegates to buildDependencyGraph + tarjan. The label-map strategy correctly preserves human-readable IDs for native Rust interop. findCyclesJS retained as a backward-compatible public export.
src/db/repository/graph-read.js Fixes the getCallableNodes query to use all 10 CORE_SYMBOL_KINDS (generated from kinds.js) instead of the hardcoded 3, resolving the prior regression for function-level cycle detection.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph consumers["Consumer Modules"]
        CY["cycles.js"]
        CM["communities.js"]
        VI["viewer.js"]
        ST["structure.js"]
        TR["triage.js"]
    end

    subgraph graph["src/graph/ subsystem"]
        subgraph builders["builders/"]
            BD["dependency.js\nbuildDependencyGraph"]
            BS["structure.js\nbuildStructureGraph"]
            BT["temporal.js\nbuildTemporalGraph"]
        end

        subgraph model["model.js"]
            CG["CodeGraph\n(directed / undirected)"]
        end

        subgraph algorithms["algorithms/"]
            TA["tarjan.js"]
            LO["louvain.js"]
            BF["bfs.js"]
            SP["shortest-path.js"]
            CE["centrality.js"]
        end

        subgraph classifiers["classifiers/"]
            RO["roles.js\nclassifyRoles"]
            RI["risk.js\nscoreRisk"]
        end
    end

    subgraph db["SQLite DB"]
        DB[("graph.db\nnodes / edges\nco_changes")]
    end

    CY -->|buildDependencyGraph| BD
    CM -->|buildDependencyGraph| BD
    VI -->|CodeGraph| CG
    VI -->|louvainCommunities| LO
    ST -->|classifyRoles| RO
    TR -->|scoreRisk| RI

    BD --> CG
    BS --> CG
    BT --> CG

    CY -->|tarjan| TA
    CM -->|louvainCommunities| LO

    BD --> DB
    BS --> DB
    BT --> DB

    TA --> CG
    LO --> CG
    BF --> CG
    SP --> CG
    CE --> CG
Loading

Last reviewed commit: 5d3a71b

const targetFile = e.target.split('|').pop();
return !isTestFile(sourceFile) && !isTestFile(targetFile);
});
const graph = buildDependencyGraph(db, { fileLevel, noTests });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function-level cycle detection silently drops 7 node kinds

The old findCycles function-level query explicitly filtered for 10 node kinds:

WHERE n1.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module')

The new code delegates to buildDependencyGraph(..., { fileLevel: false }), which calls buildFunctionLevelGraph, which in turn calls getCallableNodes(db). That function only returns nodes matching kind IN ('function','method','class') — omitting interface, type, struct, enum, trait, record, and module.

As a result, any cycle that passes through one of those 7 kinds will now be silently missed by the JS fallback path. The native Rust path receives the same narrowed edge list, so it is affected equally.

This is a behavioral regression for the function-level (--functions) cycle detection command on codebases that use TypeScript interfaces, enums, Go structs, Rust traits, etc. involved in circular calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5d3a71b. getCallableNodes now queries all 10 CORE_SYMBOL_KINDS instead of the hardcoded 3. The SQL is generated from the CORE_SYMBOL_KINDS array in kinds.js, so it stays in sync automatically if kinds are added in the future.

Comment on lines +32 to +36
get edgeCount() {
let count = 0;
for (const targets of this._successors.values()) count += targets.size;
return count;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edgeCount double-counts edges for undirected graphs

For an undirected CodeGraph, addEdge(a, b) stores the edge in both _successors (a→b and b→a). The edgeCount getter sums all entries across _successors, so it returns the logical edge count for undirected graphs. The test for this case confirms it:

expect(g.edgeCount).toBe(4); // 2 directed entries per undirected edge

This is semantically inconsistent with edges(), which correctly deduplicates and returns the actual edge count. Any caller that uses edgeCount to reason about the size of an undirected graph (e.g. buildTemporalGraph produces an undirected graph; passing it to louvainCommunities skips Louvain if edgeCount === 0, but the empty-check happens inside toGraphology via gy.size, so no present-day crash occurs) will get a value 2× too large.

Consider delegating edgeCount to [...this.edges()].length (O(edges), but correct), or tracking a separate _edgeCount counter that is incremented only once per logical edge:

get edgeCount() {
  let count = 0;
  for (const _ of this.edges()) count++;
  return count;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5d3a71b. edgeCount now divides by 2 for undirected graphs so it returns the logical edge count, consistent with edges() iterator. Test updated accordingly.

Comment on lines +198 to +209
if (type === 'undirected') {
// Deduplicate: only add each unordered pair once
for (const [src, tgt] of this.edges()) {
if (src === tgt) continue;
if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
}
} else {
for (const [src, tgt] of this.edges()) {
if (src === tgt) continue;
if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate if/else branches in toGraphology

Both the if (type === 'undirected') branch and the else branch contain identical code. The conditional adds no value and the deduplication comment (// Deduplicate: only add each unordered pair once) is misleading because both branches behave identically. The two blocks can be collapsed:

Suggested change
if (type === 'undirected') {
// Deduplicate: only add each unordered pair once
for (const [src, tgt] of this.edges()) {
if (src === tgt) continue;
if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
}
} else {
for (const [src, tgt] of this.edges()) {
if (src === tgt) continue;
if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
}
}
for (const [src, tgt] of this.edges()) {
if (src === tgt) continue;
if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5d3a71b. Collapsed the duplicate branches into a single loop as suggested.

- getCallableNodes now queries all 10 CORE_SYMBOL_KINDS, not just 3
  (fixes function-level cycle detection regression for interfaces,
  structs, enums, traits, records, modules, types)
- edgeCount returns logical edge count for undirected graphs (was 2x)
- Remove duplicate branches in toGraphology
@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 04bce00 into main Mar 13, 2026
26 checks passed
@carlos-alm carlos-alm deleted the feat/unified-graph-model branch March 13, 2026 21:42
@github-actions github-actions bot locked and limited conversation to collaborators Mar 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant