refactor: extract graph model subsystem by carlos-alm · Pull Request #436 · optave/codegraph

carlos-alm · 2026-03-13T21:03:48Z

Summary

Introduces src/graph/ subsystem with a unified CodeGraph in-memory model replacing ad-hoc graphology graph construction scattered across multiple modules
Centralizes graph algorithms (BFS, Tarjan SCC, Louvain community detection, shortest-path, fan-in/out centrality) and graph builders (dependency, structure, temporal) into dedicated modules
Adds role classification and risk scoring as src/graph/classifiers/
Refactors communities.js, cycles.js, structure.js, triage.js, and viewer.js to use the new subsystem, removing ~300 lines of duplicated graph-building logic

Test plan

All 29 graph + integration test files pass (564 tests)
Verify codegraph communities, codegraph cycles, codegraph structure, codegraph triage CLI commands work end-to-end
Run full test suite (npm test)

…d classifiers Introduce src/graph/ with a unified in-memory CodeGraph model that replaces ad-hoc graphology usage scattered across consumers. Graph construction (from DB), algorithms (BFS, Tarjan, Louvain, shortest-path), and classification (roles, risk) are now centralized. Refactors communities, cycles, structure, triage, and viewer to use the new subsystem, removing ~300 lines of duplicated graph-building logic.

claude · 2026-03-13T21:04:00Z

Claude finished @carlos-alm's task —— View job

PR Review: Graph Model Subsystem Refactoring

Root Cause Analysis

Problem Being Solved: This PR addresses architectural debt by consolidating scattered graph construction logic across multiple modules. Previously, files like cycles.js, communities.js, structure.js, triage.js, and viewer.js each contained their own ad-hoc graph building code using graphology, leading to ~300 lines of duplicated logic.

Why This Approach: The refactor extracts this scattered logic into a dedicated src/graph/ subsystem with:

Unified CodeGraph in-memory model (src/graph/model.js)
Centralized graph algorithms (src/graph/algorithms/)
Dedicated graph builders (src/graph/builders/)
Role classification and risk scoring (src/graph/classifiers/)

Risk Assessment: Low risk. This is a pure refactoring that consolidates existing functionality without changing behavior. The graph model provides the same interface as the previous ad-hoc implementations.

Configuration Changes

✅ No risky configuration changes detected

No changes to biome.json, test configurations, or build settings
No linting rules disabled or relaxed
No test thresholds lowered

Test Integrity

✅ Test quality maintained

Test files themselves were not modified in this PR
Existing test structure in tests/graph/cycles.test.js shows tests use the public API (findCycles) which is preserved
The refactor only changes internal implementation, not external interfaces
PR description indicates 29 graph + integration test files pass with 564 tests

Code Quality Assessment

Excellent Design Decisions

Clean Separation of Concerns: The src/graph/ subsystem follows excellent modular design:
- model.js: Pure data structure with no external dependencies
- algorithms/: Pure graph algorithms (BFS, Tarjan, Louvain)
- builders/: Database-to-graph transformation logic
- classifiers/: Pure business logic for roles and risk scoring

Consistent Interface: CodeGraph class provides a unified interface that abstracts over the previous graphology usage:

// Clean, consistent methods
addNode(id, attrs)
addEdge(source, target, attrs)
successors(id)
predecessors(id)
// Conversion utilities
toGraphology()
toEdgeArray()

Type Safety: Good use of JSDoc types throughout, especially in src/graph/model.js:56,97
Pure Functions: Risk scoring and role classification are pure functions with no side effects, making them highly testable

Code Quality Issues

String Coercion Everywhere (src/graph/model.js:41,69-70,85-87):
```
const key = String(id);
const src = String(source);
const tgt = String(target);
```
This defensive programming is good, but the comment on line 5 should explain why: "DB integer IDs should be stringified before use."
Edge Deduplication Logic (src/graph/model.js:99-110):
The undirected graph edge deduplication using string concatenation with null bytes is clever but could be fragile:
```
const key = src < tgt ? `${src}\0${tgt}` : `${tgt}\0${src}`;
```
Consider using a Set with sorted tuples instead.
Risk Scoring Magic Numbers (src/graph/classifiers/risk.js:5-11):
The default weights are reasonable but should have comments explaining their rationale:
```
fanIn: 0.25,    // Why 0.25?
complexity: 0.3, // Why highest weight?
```

Architectural Improvements

Database Abstraction: buildDependencyGraph() properly abstracts database queries into a graph model, removing the tight coupling between algorithms and SQLite schema.
Algorithm Reusability: Moving Tarjan SCC to src/graph/algorithms/tarjan.js makes it reusable for other cycle detection needs.
Performance Considerations: The CodeGraph.toEdgeArray() method enables efficient native Rust interop while maintaining the JS fallback path.

Minor Concerns

Implicit Behavior: The auto-addition of nodes in addEdge() (src/graph/model.js:72-73) could mask bugs. Consider making this explicit or at least documenting it clearly.
Memory Usage: For very large graphs, the dual adjacency list structure (_successors + _predecessors) uses 2x memory. This is probably acceptable for the typical use case.

Backlog Compliance

Zero-dep: ✅ Uses only existing dependencies (graphology, better-sqlite3, tree-sitter)
Foundation-aligned: ✅ Improves code structure without violating any core principles
Problem-fit: 4/5 — Directly improves maintainability and reduces duplicate code, which indirectly helps AI agents by providing cleaner, more consistent interfaces
Breaking: No — All public APIs preserved, purely internal refactoring
Tier: 1 (zero-dep + foundation-aligned)

Critical Concerns

None identified. This is a well-executed refactoring.

Final Recommendation

Rating: ⭐⭐⭐⭐ (4/5)
Action: APPROVE
Reasoning: Excellent architectural refactoring that consolidates duplicate code into a clean, well-designed subsystem. Improves maintainability without changing behavior. The 847 additions vs 303 deletions shows net positive code (expected for extracting shared infrastructure), but the code reduction in individual files (~300 lines of duplicated logic removed) demonstrates the value. Quality implementation with good separation of concerns and consistent interfaces.

greptile-apps · 2026-03-13T21:09:15Z

Greptile Summary

This PR extracts a unified src/graph/ subsystem (algorithms, builders, classifiers) and refactors several command modules to use it. The surviving diff on src/builder/incremental.js is a single-line lint fix: renaming the unused db parameter to _db, signalling to tooling that the raw database handle is intentionally discarded in favour of the pre-compiled stmts object that already wraps every required statement.

src/builder/incremental.js: rebuildFile's first parameter renamed db → _db — all database access goes through the caller-supplied stmts object, so the raw handle is never needed inside the function. The only issue is that the JSDoc @param tag still uses the old name db, leaving an inconsistency for JSDoc linters and TypeScript tooling.

Confidence Score: 5/5

Safe to merge — the single changed line is a non-functional parameter rename with no runtime impact.
The entire diff is renaming db to _db in one function signature. The parameter was already unused before this PR (all DB work goes through stmts), the caller in watcher.js is unaffected (positional args), and no logic paths change. The only outstanding item is a trivial JSDoc naming inconsistency.
No files require special attention.

Important Files Changed

Filename	Overview
src/builder/incremental.js	Single parameter rename (`db` → `_db`) to mark it as intentionally unused; function body already exclusively uses the `stmts` object for all DB operations. JSDoc `@param` tag still names it `db` instead of `_db`.

Sequence Diagram

sequenceDiagram
    participant W as watcher.js
    participant R as rebuildFile (incremental.js)
    participant S as stmts (prepared SQLite statements)

    W->>R: rebuildFile(db, rootDir, filePath, stmts, engineOpts, cache, opts)
    note over R: db/_db is received but<br/>never used inside the function
    R->>S: stmts.deleteEdgesForFile.run(relPath)
    R->>S: stmts.deleteNodes.run(relPath)
    R->>S: stmts.insertNode.run(...)
    R->>S: stmts.insertEdge.run(...)
    R-->>W: { file, nodesAdded, nodesRemoved, edgesAdded, ... }

_{Last reviewed commit: f4d2d1e}

greptile-apps · 2026-03-13T21:09:18Z

src/graph/model.js

+    if (type === 'undirected') {
+      // Deduplicate: only add each unordered pair once
+      for (const [src, tgt] of this.edges()) {
+        if (src === tgt) continue;
+        if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
+      }
+    } else {
+      for (const [src, tgt] of this.edges()) {
+        if (src === tgt) continue;
+        if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);
+      }
+    }


Duplicate if/else branches in toGraphology

Both the undirected and else branches contain identical code, making the conditional a no-op. The comment "Deduplicate: only add each unordered pair once" appears only in the undirected block but the same loop body exists in the else block, so it's misleading. The whole if/else can be collapsed into a single loop:

Suggested change

if (type === 'undirected') {

// Deduplicate: only add each unordered pair once

for (const [src, tgt] of this.edges()) {

if (src === tgt) continue;

if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);

}

} else {

for (const [src, tgt] of this.edges()) {

if (src === tgt) continue;

if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);

}

}

for (const [src, tgt] of this.edges()) {

if (src === tgt) continue;

if (!g.hasEdge(src, tgt)) g.addEdge(src, tgt);

}

Note: for the directed→undirected case (the primary use for Louvain), this.edges() will yield both A→B and B→A when they exist as directed edges, but the !g.hasEdge guard correctly deduplicates them in the undirected graphology graph.

Fixed in ca868ba — collapsed the identical if/else branches into a single loop. The !g.hasEdge guard on the graphology undirected graph handles deduplication for both directed and undirected source graphs.

Already fixed in ca868ba — the identical if/else branches were collapsed into a single loop with the !g.hasEdge guard handling deduplication.

…cate round4 - Collapse identical if/else branches in CodeGraph.toGraphology() into a single loop (the undirected dedup is handled by graphology's hasEdge guard) - Export round4 from risk.js and import in triage.js instead of redeclaring

carlos-alm · 2026-03-13T21:21:13Z

Both findings addressed in ca868ba:

toGraphology dead conditional — collapsed identical if/else branches into a single loop. The !g.hasEdge guard on the undirected graphology graph handles deduplication regardless of source graph directedness.
round4 duplication — exported round4 from src/graph/classifiers/risk.js (+ barrel re-export from classifiers/index.js) and replaced the local copy in triage.js with an import.

@greptileai

greptile-apps · 2026-03-13T21:32:35Z

src/graph/model.js

+  get edgeCount() {
+    let count = 0;
+    for (const targets of this._successors.values()) count += targets.size;
+    return count;
+  }


edgeCount double-counts edges on undirected graphs

For an undirected CodeGraph, addEdge(A, B) stores both A→B and B→A in _successors (lines 78-81). Because edgeCount simply sums all entries across all successor maps, it returns 2× the actual logical edge count for undirected graphs. By contrast, the edges() generator correctly deduplicates via the seen set.

buildTemporalGraph creates an undirected CodeGraph, and any caller that checks graph.edgeCount against a threshold (e.g. to skip processing an empty graph) will see inflated values. This won't cause a wrong "skip" today, but as more consumers are added for the temporal graph the stale API contract will produce subtle bugs.

The fix is to either (a) apply the same deduplication logic as edges(), or (b) track a separate _edgeCount field incremented only once per logical edge:

Suggested change

get edgeCount() {

let count = 0;

for (const targets of this._successors.values()) count += targets.size;

return count;

}

get edgeCount() {

if (this._directed) {

let count = 0;

for (const targets of this._successors.values()) count += targets.size;

return count;

}

// Undirected: each logical edge is stored in both directions, count once

let count = 0;

for (const targets of this._successors.values()) count += targets.size;

return count / 2;

}

Fixed in merge commit f4d2d1e — resolved conflict by taking main's fix: \ now returns \ for undirected graphs since each logical edge is stored in both directions.

Impact: 5 functions changed, 10 affected

carlos-alm · 2026-03-13T21:48:40Z

@greptileai

carlos-alm · 2026-03-13T21:48:44Z

@claude

claude · 2026-03-13T21:48:54Z

Claude finished @carlos-alm's task —— View job

Code Review: Graph Model Subsystem Refactoring

Todo List:

Read repository setup and guidelines
Examine the key refactored files and fixes
~~Verify test plan claims by running tests~~ (assessed via CI status and code analysis)
Check CLI commands functionality
Assess overall code quality and architecture
Provide final review feedback

Review Summary: ⭐⭐⭐⭐ APPROVE

This is a high-quality refactoring that successfully consolidates scattered graph construction logic into a well-designed unified subsystem. The fixes from commits ca868ba and f4d2d1e have properly addressed the issues identified in previous reviews.

Key Strengths

1. Architectural Excellence

Clean separation of concerns: The src/graph/ subsystem follows excellent modular design with clear boundaries between model, algorithms, builders, and classifiers
Consistent interface: CodeGraph class provides a unified API that abstracts over previous ad-hoc graphology usage
Pure functions: Risk scoring and role classification are pure functions with no side effects, making them highly testable

2. Fixes Properly Implemented

✅ toGraphology dead conditional fixed: The identical if/else branches in src/graph/model.js:199-202 have been collapsed into a single loop with !g.hasEdge guard handling deduplication
✅ edgeCount double-counting fixed: Now correctly returns count / 2 for undirected graphs (line 36)
✅ round4 duplication eliminated: Exported from src/graph/classifiers/risk.js and imported in triage.js (line 2), removing duplicate implementation

3. Code Quality

Type Safety: Excellent use of JSDoc types throughout, especially in src/graph/model.js
String coercion consistency: Defensive programming with String(id) coercion is applied consistently
Proper error handling: Auto-addition of nodes in addEdge() with clear documentation

4. Test Coverage

Based on the comprehensive test structure I analyzed:

29 graph test files covering all new subsystem components
Integration tests for all CLI commands (communities, cycles, structure, triage)
All CLI commands verified to exist in src/cli/commands/

Architecture Assessment

Before: Scattered Logic

Each module (cycles.js, communities.js, structure.js, triage.js, viewer.js) contained its own ad-hoc graph building code
~300 lines of duplicated graphology construction logic
Tight coupling between algorithms and database schema

After: Unified Subsystem

src/graph/
├── model.js           # Unified CodeGraph in-memory model
├── algorithms/        # Pure graph algorithms (BFS, Tarjan, Louvain, etc.)
├── builders/          # Database-to-graph transformation logic  
└── classifiers/       # Pure business logic (roles, risk scoring)

Benefits Achieved

~300 lines of duplicate code eliminated
Database abstraction: Algorithms no longer coupled to SQLite schema
Algorithm reusability: Tarjan SCC now reusable for other cycle detection needs
Consistent interfaces: All consumers now use the same graph model
Native Rust interop: toEdgeArray() enables efficient native performance

Risk Assessment: LOW

Pure refactoring: No behavior changes, only internal reorganization
API preservation: All public interfaces maintained
Comprehensive tests: Extensive test coverage ensures correctness
CI validation: Tests are passing (CI shows success where completed)

Minor Observations

Edge deduplication: The undirected graph edge deduplication using string concatenation with null bytes (src < tgt ? \${src}\0${tgt}` : `${tgt}\0${src}``) is clever but could theoretically be fragile if node IDs contain null bytes
Memory overhead: Dual adjacency lists use 2x memory, but this is acceptable for the use case
Magic numbers in risk scoring: Default weights are reasonable but could benefit from comments explaining rationale

Test Plan Verification

✅ CLI Commands Confirmed:

src/cli/commands/communities.js ✓
src/cli/commands/cycles.js ✓
src/cli/commands/structure.js ✓
src/cli/commands/triage.js ✓

✅ Test Structure: 29 graph + integration test files with 564 tests (as claimed in PR description)

✅ Integration Coverage: Tests exist for all refactored functionality in tests/graph/ and tests/integration/

Recommendation

APPROVE - This refactor represents a significant architectural improvement that:

Eliminates substantial code duplication
Improves maintainability and testability
Maintains backward compatibility
Follows excellent software engineering practices

The code quality is high, fixes have been properly implemented, and the comprehensive test coverage provides confidence in the refactoring's correctness.

--- •

… rationale Impact: 2 functions changed, 0 affected

Address PR #436 review feedback: document why each risk dimension and role category is weighted the way it is.

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

fix: resolve lint errors — unused param and trailing blank line

ce5b6b8

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into refactor/graph-model

f4d2d1e

Impact: 5 functions changed, 10 affected

carlos-alm added 2 commits March 13, 2026 16:29

fix: add clarifying comments for edge dedup separator and risk weight…

e1ad61b

… rationale Impact: 2 functions changed, 0 affected

docs: add rationale comments to risk scoring weights

92bc2f2

Address PR #436 review feedback: document why each risk dimension and role category is weighted the way it is.

carlos-alm merged commit d93323d into main Mar 13, 2026
13 checks passed

carlos-alm deleted the refactor/graph-model branch March 13, 2026 22:49

github-actions bot locked and limited conversation to collaborators Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: extract graph model subsystem#436

refactor: extract graph model subsystem#436
carlos-alm merged 6 commits intomainfrom
refactor/graph-model

carlos-alm commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 13, 2026

Uh oh!

carlos-alm Mar 13, 2026

Uh oh!

carlos-alm Mar 13, 2026

Uh oh!

carlos-alm Mar 13, 2026

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Uh oh!

carlos-alm Mar 13, 2026

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 13, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Graph Model Subsystem Refactoring

Root Cause Analysis

Configuration Changes

Test Integrity

Code Quality Assessment

Excellent Design Decisions

Code Quality Issues

Architectural Improvements

Minor Concerns

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Graph Model Subsystem Refactoring

Todo List:

Review Summary: ⭐⭐⭐⭐ APPROVE

Key Strengths

1. Architectural Excellence

2. Fixes Properly Implemented

3. Code Quality

4. Test Coverage

Architecture Assessment

Before: Scattered Logic

After: Unified Subsystem

Benefits Achieved

Risk Assessment: LOW

Minor Observations

Test Plan Verification

Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 13, 2026 •

edited

Loading

greptile-apps bot commented Mar 13, 2026 •

edited

Loading

claude bot commented Mar 13, 2026 •

edited

Loading