refactor: extract unified graph model subsystem (ROADMAP 3.11)#435
refactor: extract unified graph model subsystem (ROADMAP 3.11)#435carlos-alm merged 2 commits intomainfrom
Conversation
Extract shared src/graph/ subsystem from duplicated graph construction across cycles.js, communities.js, viewer.js, structure.js, and triage.js. - CodeGraph class: nodes, edges, adjacency, filtering, graphology conversion - Algorithms: Tarjan SCC, Louvain communities, BFS, shortest path, centrality - Classifiers: role classification (from structure.js), risk scoring (from triage.js) - Builders: dependency graph, structure graph, temporal graph (from DB) - All consumers refactored to use graph subsystem; public APIs unchanged - 66 new unit tests, all 1670 existing tests still pass
|
Claude finished @carlos-alm's task —— View job Reviewing PR: Unified Graph Model SubsystemConducted comprehensive review of this large refactoring PR (32 files, 1613+ additions). Root Cause AnalysisProblem Being Solved: Eliminating code duplication across 5 modules ( Why This Approach: Creates a centralized Risk Assessment: Low risk - this is pure refactoring with no behavior changes claimed and extensive test coverage. Backlog Compliance
Critical ConcernsNone identified. This is a well-executed refactoring with: ✅ Clean Architecture: Logical subsystem organization with proper separation of concerns Minor Observations
Final Recommendation
|
Greptile SummaryThis PR extracts a well-designed Key observations:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph consumers["Consumer Modules"]
CY["cycles.js"]
CM["communities.js"]
VI["viewer.js"]
ST["structure.js"]
TR["triage.js"]
end
subgraph graph["src/graph/ subsystem"]
subgraph builders["builders/"]
BD["dependency.js\nbuildDependencyGraph"]
BS["structure.js\nbuildStructureGraph"]
BT["temporal.js\nbuildTemporalGraph"]
end
subgraph model["model.js"]
CG["CodeGraph\n(directed / undirected)"]
end
subgraph algorithms["algorithms/"]
TA["tarjan.js"]
LO["louvain.js"]
BF["bfs.js"]
SP["shortest-path.js"]
CE["centrality.js"]
end
subgraph classifiers["classifiers/"]
RO["roles.js\nclassifyRoles"]
RI["risk.js\nscoreRisk"]
end
end
subgraph db["SQLite DB"]
DB[("graph.db\nnodes / edges\nco_changes")]
end
CY -->|buildDependencyGraph| BD
CM -->|buildDependencyGraph| BD
VI -->|CodeGraph| CG
VI -->|louvainCommunities| LO
ST -->|classifyRoles| RO
TR -->|scoreRisk| RI
BD --> CG
BS --> CG
BT --> CG
CY -->|tarjan| TA
CM -->|louvainCommunities| LO
BD --> DB
BS --> DB
BT --> DB
TA --> CG
LO --> CG
BF --> CG
SP --> CG
CE --> CG
Last reviewed commit: 5d3a71b |
| const targetFile = e.target.split('|').pop(); | ||
| return !isTestFile(sourceFile) && !isTestFile(targetFile); | ||
| }); | ||
| const graph = buildDependencyGraph(db, { fileLevel, noTests }); |
There was a problem hiding this comment.
Function-level cycle detection silently drops 7 node kinds
The old findCycles function-level query explicitly filtered for 10 node kinds:
WHERE n1.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module')The new code delegates to buildDependencyGraph(..., { fileLevel: false }), which calls buildFunctionLevelGraph, which in turn calls getCallableNodes(db). That function only returns nodes matching kind IN ('function','method','class') — omitting interface, type, struct, enum, trait, record, and module.
As a result, any cycle that passes through one of those 7 kinds will now be silently missed by the JS fallback path. The native Rust path receives the same narrowed edge list, so it is affected equally.
This is a behavioral regression for the function-level (--functions) cycle detection command on codebases that use TypeScript interfaces, enums, Go structs, Rust traits, etc. involved in circular calls.
There was a problem hiding this comment.
Fixed in 5d3a71b. getCallableNodes now queries all 10 CORE_SYMBOL_KINDS instead of the hardcoded 3. The SQL is generated from the CORE_SYMBOL_KINDS array in kinds.js, so it stays in sync automatically if kinds are added in the future.
| get edgeCount() { | ||
| let count = 0; | ||
| for (const targets of this._successors.values()) count += targets.size; | ||
| return count; | ||
| } |
There was a problem hiding this comment.
edgeCount double-counts edges for undirected graphs
For an undirected CodeGraph, addEdge(a, b) stores the edge in both _successors (a→b and b→a). The edgeCount getter sums all entries across _successors, so it returns 2× the logical edge count for undirected graphs. The test for this case confirms it:
expect(g.edgeCount).toBe(4); // 2 directed entries per undirected edgeThis is semantically inconsistent with edges(), which correctly deduplicates and returns the actual edge count. Any caller that uses edgeCount to reason about the size of an undirected graph (e.g. buildTemporalGraph produces an undirected graph; passing it to louvainCommunities skips Louvain if edgeCount === 0, but the empty-check happens inside toGraphology via gy.size, so no present-day crash occurs) will get a value 2× too large.
Consider delegating edgeCount to [...this.edges()].length (O(edges), but correct), or tracking a separate _edgeCount counter that is incremented only once per logical edge:
get edgeCount() {
let count = 0;
for (const _ of this.edges()) count++;
return count;
}There was a problem hiding this comment.
Fixed in 5d3a71b. edgeCount now divides by 2 for undirected graphs so it returns the logical edge count, consistent with edges() iterator. Test updated accordingly.
src/graph/model.js
Outdated
| if (type === 'undirected') { | ||
| // Deduplicate: only add each unordered pair once | ||
| for (const [src, tgt] of this.edges()) { | ||
| if (src === tgt) continue; | ||
| if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt); | ||
| } | ||
| } else { | ||
| for (const [src, tgt] of this.edges()) { | ||
| if (src === tgt) continue; | ||
| if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt); | ||
| } | ||
| } |
There was a problem hiding this comment.
Duplicate if/else branches in toGraphology
Both the if (type === 'undirected') branch and the else branch contain identical code. The conditional adds no value and the deduplication comment (// Deduplicate: only add each unordered pair once) is misleading because both branches behave identically. The two blocks can be collapsed:
| if (type === 'undirected') { | |
| // Deduplicate: only add each unordered pair once | |
| for (const [src, tgt] of this.edges()) { | |
| if (src === tgt) continue; | |
| if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt); | |
| } | |
| } else { | |
| for (const [src, tgt] of this.edges()) { | |
| if (src === tgt) continue; | |
| if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt); | |
| } | |
| } | |
| for (const [src, tgt] of this.edges()) { | |
| if (src === tgt) continue; | |
| if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt); | |
| } |
There was a problem hiding this comment.
Fixed in 5d3a71b. Collapsed the duplicate branches into a single loop as suggested.
- getCallableNodes now queries all 10 CORE_SYMBOL_KINDS, not just 3 (fixes function-level cycle detection regression for interfaces, structs, enums, traits, records, modules, types) - edgeCount returns logical edge count for undirected graphs (was 2x) - Remove duplicate branches in toGraphology
Summary
src/graph/subsystem that centralizes duplicated graph construction acrosscycles.js,communities.js,viewer.js,structure.js, andtriage.jsCodeGraphclass with nodes, edges, adjacency, filtering, subgraph, and graphology conversionstructure.js), risk scoring (pure logic fromtriage.js)buildDependencyGraph,buildStructureGraph,buildTemporalGraph(DB → CodeGraph)Structure
Test plan
biome check— 0 errors)build,cycles,communities,triage,structure