Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 74 additions & 51 deletions docs/roadmap/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Codegraph is a strong local-first code graph CLI. This roadmap describes planned
| [**2**](#phase-2--foundation-hardening) | Foundation Hardening | Parser registry, complete MCP, test coverage, enhanced config, multi-repo MCP | **Complete** (v1.4.0) |
| [**2.5**](#phase-25--analysis-expansion) | Analysis Expansion | Complexity metrics, community detection, flow tracing, co-change, manifesto, boundary rules, check, triage, audit, batch, hybrid search | **Complete** (v2.6.0) |
| [**2.7**](#phase-27--deep-analysis--graph-enrichment) | Deep Analysis & Graph Enrichment | Dataflow analysis, intraprocedural CFG, AST node storage, expanded node/edge types, extractors refactoring, CLI consolidation, interactive viewer, exports command, normalizeSymbol | **Complete** (v3.0.0) |
| [**3**](#phase-3--architectural-refactoring) | Architectural Refactoring (Vertical Slice) | Unified AST analysis framework, command/query separation, repository pattern, queries.js decomposition, composable MCP, CLI commands, domain errors, presentation layer, domain grouping, curated API, unified graph model | **In Progress** (v3.1.3) |
| [**3**](#phase-3--architectural-refactoring) | Architectural Refactoring (Vertical Slice) | Unified AST analysis framework, command/query separation, repository pattern, queries.js decomposition, composable MCP, CLI commands, domain errors, builder pipeline, presentation layer, domain grouping, curated API, unified graph model, qualified names | **In Progress** (v3.1.3) |
| [**4**](#phase-4--typescript-migration) | TypeScript Migration | Project setup, core type definitions, leaf -> core -> orchestration module migration, test migration | Planned |
| [**5**](#phase-5--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, enhanced embeddings, build-time semantic metadata, module summaries | Planned |
| [**6**](#phase-6--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned |
Expand Down Expand Up @@ -667,7 +667,7 @@ src/
src/
db/
connection.js # Open, WAL mode, pragma tuning
migrations.js # Schema versions (currently 13 migrations)
migrations.js # Schema versions (currently 15 migrations)
query-builder.js # Lightweight SQL builder for common filtered queries
repository/
index.js # Barrel re-export
Expand Down Expand Up @@ -775,9 +775,9 @@ Reduced `index.js` from ~190 named exports (243 lines) to 48 curated exports (57

> **Removed: Decompose complexity.js** — Subsumed by 3.1. The standalone complexity decomposition from the previous revision is now part of the unified AST analysis framework (3.1). The `complexity.js` per-language rules become `ast-analysis/rules/complexity/{lang}.js` alongside CFG and dataflow rules.

### 3.8 -- Domain Error Hierarchy
### 3.8 -- Domain Error Hierarchy

Replace ad-hoc error handling (mix of thrown `Error`, returned `null`, `logger.warn()`, `process.exit(1)`) across 50 modules with structured domain errors.
Structured domain errors replace ad-hoc error handling across the codebase. 8 error classes in `src/errors.js`: `CodegraphError`, `ParseError`, `DbError`, `ConfigError`, `ResolutionError`, `EngineError`, `AnalysisError`, `BoundaryError`. The CLI catches domain errors and formats for humans; MCP returns structured `{ isError, code }` responses.

```js
class CodegraphError extends Error { constructor(message, { code, file, cause }) { ... } }
Expand All @@ -790,41 +790,43 @@ class AnalysisError extends CodegraphError { code = 'ANALYSIS_FAILED' }
class BoundaryError extends CodegraphError { code = 'BOUNDARY_VIOLATION' }
```

The CLI catches domain errors and formats for humans. MCP returns structured error responses. No more `process.exit()` from library code.
- ✅ `src/errors.js` — 8 domain error classes with `code`, `file`, `cause` fields
- ✅ CLI top-level catch formats domain errors for humans
- ✅ MCP returns structured error responses
- ✅ Domain errors adopted across config, boundaries, triage, and query modules

**New file:** `src/errors.js`

### 3.9 -- Builder Pipeline Architecture
### 3.9 -- Builder Pipeline Architecture

Refactor `buildGraph()` (1,355 lines) from a mega-function into explicit, independently testable pipeline stages. Phase 2.7 added 4 opt-in stages, bringing the total to 11 core + 4 optional.
Refactored `buildGraph()` from a monolithic mega-function into explicit, independently testable pipeline stages. `src/builder.js` is now a 12-line barrel re-export. `src/builder/pipeline.js` orchestrates 9 stages via `PipelineContext`. Each stage is a separate file in `src/builder/stages/`.

```js
const pipeline = [
// Core (always)
collectFiles, // (rootDir, config) => filePaths[]
detectChanges, // (filePaths, db) => { changed, removed, isFullBuild }
parseFiles, // (filePaths, engineOpts) => Map<file, symbols>
insertNodes, // (symbolMap, db) => nodeIndex
resolveImports, // (symbolMap, rootDir, aliases) => importEdges[]
buildCallEdges, // (symbolMap, nodeIndex) => callEdges[]
buildClassEdges, // (symbolMap, nodeIndex) => classEdges[]
resolveBarrels, // (edges, symbolMap) => resolvedEdges[]
insertEdges, // (allEdges, db) => stats
extractASTNodes, // (fileSymbols, db) => astStats (always, post-parse)
buildStructure, // (db, fileSymbols, rootDir) => structureStats
classifyRoles, // (db) => roleStats
emitChangeJournal, // (rootDir, changes) => void

// Opt-in (dynamic imports)
computeComplexity, // --complexity: (db, rootDir, engine) => complexityStats
buildDataflowEdges, // --dataflow: (db, fileSymbols, rootDir) => dataflowStats
buildCFGData, // --cfg: (db, fileSymbols, rootDir) => cfgStats
]
```
src/
builder.js # 12-line barrel re-export
builder/
context.js # PipelineContext — shared state across stages
pipeline.js # Orchestrator: setup → stages → timing
helpers.js # batchInsertNodes, collectFiles, fileHash, etc.
incremental.js # Incremental build logic
stages/
collect-files.js # Discover source files
detect-changes.js # Incremental: hash comparison, removed detection
parse-files.js # Parse via native/WASM engine
insert-nodes.js # Batch-insert nodes, children, contains/parameter_of edges
resolve-imports.js # Import resolution with aliases
build-edges.js # Call edges, class edges, barrel resolution
build-structure.js # Directory/file hierarchy
run-analyses.js # Complexity, CFG, dataflow, AST store
finalize.js # Build meta, timing, db close
```

Watch mode reuses the same stages triggered per-file, eliminating the `watcher.js` divergence.
- ✅ `PipelineContext` shared state replaces function parameters
- ✅ 9 sequential stages, each independently testable
- ✅ `src/builder.js` reduced to barrel re-export
- ✅ Timing tracked per-stage in `ctx.timing`

**Affected files:** `src/builder.js`, `src/watcher.js`
**Affected files:** `src/builder.js` → split into `src/builder/`

### 3.10 -- Embedder Subsystem Extraction

Expand Down Expand Up @@ -852,49 +854,70 @@ The pluggable store interface enables future O(log n) ANN search (e.g., `hnswlib

**Affected files:** `src/embedder.js` -> split into `src/embeddings/`

### 3.11 -- Unified Graph Model
### 3.11 -- Unified Graph Model

Unify the four parallel graph representations (structure.js, cochange.js, communities.js, viewer.js) into a shared in-memory graph model.
Unified the four parallel graph representations into a shared in-memory `CodeGraph` model. The `src/graph/` directory contains the model, 3 builders, 6 algorithms, and 2 classifiers. Algorithms are composable — run community detection on the dependency graph, the temporal graph, or a merged graph.

```
src/
graph/
model.js # Shared in-memory graph (nodes + edges + metadata)
index.js # Barrel re-export
model.js # CodeGraph class: nodes Map, directed/undirected adjacency
builders/
dependency.js # Build from SQLite edges
index.js # Barrel
dependency.js # Build from SQLite call/import edges
structure.js # Build from file/directory hierarchy
temporal.js # Build from git history (co-changes)
temporal.js # Build from git co-change history
algorithms/
index.js # Barrel
bfs.js # Breadth-first traversal
shortest-path.js # Path finding
tarjan.js # Cycle detection
shortest-path.js # Dijkstra path finding
tarjan.js # Strongly connected components / cycle detection
louvain.js # Community detection
centrality.js # Fan-in/fan-out, betweenness
clustering.js # Cohesion, coupling, density
centrality.js # Fan-in/fan-out, betweenness centrality
classifiers/
roles.js # Node role classification
risk.js # Risk scoring
index.js # Barrel
roles.js # Node role classification (hub, utility, leaf, etc.)
risk.js # Composite risk scoring
```

Algorithms become composable -- run community detection on the dependency graph, the temporal graph, or a merged graph.
- ✅ `CodeGraph` in-memory model with nodes Map, successors/predecessors adjacency
- ✅ 3 builders: dependency (SQLite edges), structure (file hierarchy), temporal (git co-changes)
- ✅ 6 algorithms: BFS, shortest-path, Tarjan SCC, Louvain community, centrality
- ✅ 2 classifiers: role classification, risk scoring
- ✅ `structure.js`, `communities.js`, `cycles.js`, `triage.js`, `viewer.js` refactored to use graph model

**Affected files:** `src/structure.js`, `src/cochange.js`, `src/communities.js`, `src/cycles.js`, `src/triage.js`, `src/viewer.js`

### 3.12 -- Qualified Names & Hierarchical Scoping (Partially Addressed)
### 3.12 -- Qualified Names & Hierarchical Scoping

> **Phase 2.7 progress:** `parent_id` column, `contains` edges, `parameter_of` edges, and `childrenData()` query now model one-level parent-child relationships. This addresses ~80% of the use case.
> **Phase 2.7 progress:** `parent_id` column, `contains` edges, `parameter_of` edges, and `childrenData()` query now model one-level parent-child relationships.

Remaining work -- enrich the node model with deeper scope information:
Node model enriched with `qualified_name`, `scope`, and `visibility` columns (migration v15). Enables direct lookups like "all methods of class X" via `findNodesByScope()` and qualified name resolution via `findNodeByQualifiedName()` — no edge traversal needed.

```sql
ALTER TABLE nodes ADD COLUMN qualified_name TEXT; -- 'DateHelper.format'
ALTER TABLE nodes ADD COLUMN scope TEXT; -- 'DateHelper'
ALTER TABLE nodes ADD COLUMN qualified_name TEXT; -- 'DateHelper.format', 'freeFunction.x'
ALTER TABLE nodes ADD COLUMN scope TEXT; -- 'DateHelper', null for top-level
ALTER TABLE nodes ADD COLUMN visibility TEXT; -- 'public' | 'private' | 'protected'
CREATE INDEX idx_nodes_qualified_name ON nodes(qualified_name);
CREATE INDEX idx_nodes_scope ON nodes(scope);
```

Enables queries like "all methods of class X" without traversing edges. The `parent_id` FK only goes one level -- deeply nested scopes (namespace > class > method > closure) aren't fully represented. `qualified_name` would allow direct lookup.

**Affected files:** `src/db.js`, `src/extractors/`, `src/queries.js`, `src/builder.js`
- ✅ Migration v15: `qualified_name`, `scope`, `visibility` columns + indexes
- ✅ `batchInsertNodes` expanded to 9 columns (name, kind, file, line, end_line, parent_id, qualified_name, scope, visibility)
- ✅ `insert-nodes.js` computes qualified_name and scope during insertion: methods get scope from class prefix, children get `parent.child` qualified names
- ✅ Visibility extraction for all 8 language extractors:
- JS/TS: `accessibility_modifier` nodes + `#` private field detection
- Java/C#/PHP: `modifiers`/`visibility_modifier` AST nodes via shared `extractModifierVisibility()`
- Python: convention-based (`__name` → private, `_name` → protected)
- Go: capitalization convention (uppercase → public, lowercase → private)
- Rust: `visibility_modifier` child (`pub` → public, else private)
- ✅ `findNodesByScope(db, scopeName, opts)` — query by scope with optional kind/file filters
- ✅ `findNodeByQualifiedName(db, qualifiedName)` — direct lookup without edge traversal
- ✅ `childrenData()` returns `qualifiedName`, `scope`, `visibility` for parent and children
- ✅ Integration tests covering qualified_name, scope, visibility, and childrenData output

**Affected files:** `src/db/migrations.js`, `src/db/repository/nodes.js`, `src/builder/helpers.js`, `src/builder/stages/insert-nodes.js`, `src/extractors/*.js`, `src/extractors/helpers.js`, `src/analysis/symbol-lookup.js`

### 3.13 -- Testing Pyramid with InMemoryRepository

Expand Down
6 changes: 6 additions & 0 deletions src/analysis/symbol-lookup.js
Original file line number Diff line number Diff line change
Expand Up @@ -209,11 +209,17 @@ export function childrenData(name, customDbPath, opts = {}) {
kind: node.kind,
file: node.file,
line: node.line,
scope: node.scope || null,
visibility: node.visibility || null,
qualifiedName: node.qualified_name || null,
children: children.map((c) => ({
name: c.name,
kind: c.kind,
line: c.line,
endLine: c.end_line || null,
qualifiedName: c.qualified_name || null,
scope: c.scope || null,
visibility: c.visibility || null,
})),
};
});
Expand Down
8 changes: 4 additions & 4 deletions src/builder/helpers.js
Original file line number Diff line number Diff line change
Expand Up @@ -183,17 +183,17 @@ export const BATCH_CHUNK = 200;

/**
* Batch-insert node rows via multi-value INSERT statements.
* Each row: [name, kind, file, line, end_line, parent_id]
* Each row: [name, kind, file, line, end_line, parent_id, qualified_name, scope, visibility]
*/
export function batchInsertNodes(db, rows) {
if (!rows.length) return;
const ph = '(?,?,?,?,?,?)';
const ph = '(?,?,?,?,?,?,?,?,?)';
for (let i = 0; i < rows.length; i += BATCH_CHUNK) {
const chunk = rows.slice(i, i + BATCH_CHUNK);
const vals = [];
for (const r of chunk) vals.push(r[0], r[1], r[2], r[3], r[4], r[5]);
for (const r of chunk) vals.push(r[0], r[1], r[2], r[3], r[4], r[5], r[6], r[7], r[8]);
db.prepare(
'INSERT OR IGNORE INTO nodes (name,kind,file,line,end_line,parent_id) VALUES ' +
'INSERT OR IGNORE INTO nodes (name,kind,file,line,end_line,parent_id,qualified_name,scope,visibility) VALUES ' +
chunk.map(() => ph).join(','),
).run(...vals);
}
Expand Down
25 changes: 22 additions & 3 deletions src/builder/stages/insert-nodes.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,29 @@ export async function insertNodes(ctx) {

const insertAll = db.transaction(() => {
// Phase 1: Batch insert all file nodes + definitions + exports
// Row format: [name, kind, file, line, end_line, parent_id, qualified_name, scope, visibility]
const phase1Rows = [];
for (const [relPath, symbols] of allSymbols) {
phase1Rows.push([relPath, 'file', relPath, 0, null, null]);
phase1Rows.push([relPath, 'file', relPath, 0, null, null, null, null, null]);
for (const def of symbols.definitions) {
phase1Rows.push([def.name, def.kind, relPath, def.line, def.endLine || null, null]);
// Methods already have 'Class.method' as name — use as qualified_name.
// For methods, scope is the class portion; for top-level defs, scope is null.
const dotIdx = def.name.lastIndexOf('.');
const scope = dotIdx !== -1 ? def.name.slice(0, dotIdx) : null;
phase1Rows.push([
def.name,
def.kind,
relPath,
def.line,
def.endLine || null,
null,
def.name,
scope,
def.visibility || null,
]);
}
for (const exp of symbols.exports) {
phase1Rows.push([exp.name, exp.kind, relPath, exp.line, null, null]);
phase1Rows.push([exp.name, exp.kind, relPath, exp.line, null, null, exp.name, null, null]);
}
}
batchInsertNodes(db, phase1Rows);
Expand All @@ -84,13 +99,17 @@ export async function insertNodes(ctx) {
const defId = nodeIdMap.get(`${def.name}|${def.kind}|${def.line}`);
if (!defId) continue;
for (const child of def.children) {
const qualifiedName = `${def.name}.${child.name}`;
childRows.push([
child.name,
child.kind,
relPath,
child.line,
child.endLine || null,
defId,
qualifiedName,
def.name,
child.visibility || null,
]);
}
}
Expand Down
2 changes: 2 additions & 0 deletions src/db.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,10 @@ export {
findImportTargets,
findIntraFileCallEdges,
findNodeById,
findNodeByQualifiedName,
findNodeChildren,
findNodesByFile,
findNodesByScope,
findNodesForTriage,
findNodesWithFanIn,
getCallableNodes,
Expand Down
41 changes: 41 additions & 0 deletions src/db/migrations.js
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,17 @@ export const MIGRATIONS = [
CREATE INDEX IF NOT EXISTS idx_nodes_exported ON nodes(exported);
`,
},
{
version: 15,
up: `
ALTER TABLE nodes ADD COLUMN qualified_name TEXT;
ALTER TABLE nodes ADD COLUMN scope TEXT;
ALTER TABLE nodes ADD COLUMN visibility TEXT;
UPDATE nodes SET qualified_name = name WHERE qualified_name IS NULL;
CREATE INDEX IF NOT EXISTS idx_nodes_qualified_name ON nodes(qualified_name);
CREATE INDEX IF NOT EXISTS idx_nodes_scope ON nodes(scope);
`,
},
];

export function getBuildMeta(db, key) {
Expand Down Expand Up @@ -309,4 +320,34 @@ export function initSchema(db) {
} catch {
/* already exists */
}
try {
db.exec('ALTER TABLE nodes ADD COLUMN qualified_name TEXT');
} catch {
/* already exists */
}
try {
db.exec('ALTER TABLE nodes ADD COLUMN scope TEXT');
} catch {
/* already exists */
}
try {
db.exec('ALTER TABLE nodes ADD COLUMN visibility TEXT');
} catch {
/* already exists */
}
try {
db.exec('UPDATE nodes SET qualified_name = name WHERE qualified_name IS NULL');
} catch {
/* nodes table may not exist yet */
}
try {
db.exec('CREATE INDEX IF NOT EXISTS idx_nodes_qualified_name ON nodes(qualified_name)');
} catch {
/* already exists */
}
try {
db.exec('CREATE INDEX IF NOT EXISTS idx_nodes_scope ON nodes(scope)');
} catch {
/* already exists */
}
}
2 changes: 2 additions & 0 deletions src/db/repository/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,10 @@ export {
countNodes,
findFileNodes,
findNodeById,
findNodeByQualifiedName,
findNodeChildren,
findNodesByFile,
findNodesByScope,
findNodesForTriage,
findNodesWithFanIn,
getFunctionNodeId,
Expand Down
Loading
Loading