Skip to content

docs: rewrite README problem statement#428

Merged
carlos-alm merged 3 commits intomainfrom
docs/rewrite-problem-statement
Mar 13, 2026
Merged

docs: rewrite README problem statement#428
carlos-alm merged 3 commits intomainfrom
docs/rewrite-problem-statement

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Reframe the problem statement from "saving tokens" to the real pain: agents that make wrong assumptions, produce structurally broken PRs, and trigger multiple review rounds
  • Lead with the impossible trade-off: spend tokens exploring (context bloat) or assume (hallucinate)
  • Update "Why it matters" table to lead with code review impact
  • Rewrite the "What codegraph does" pitch around fewer review rounds, not fewer tool calls

… rounds

Reframe the value proposition from "saving tokens" to the real benefits:
agents that produce structurally sound code and PRs that pass automated
review (Greptile, CodeRabbit) on the first round instead of the third.
@carlos-alm carlos-alm force-pushed the docs/rewrite-problem-statement branch from d3e1d67 to 795a26c Compare March 13, 2026 05:00
@carlos-alm carlos-alm merged commit e7a24aa into main Mar 13, 2026
14 checks passed
@carlos-alm carlos-alm deleted the docs/rewrite-problem-statement branch March 13, 2026 05:03
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR combines a README problem-statement rewrite (docs-only) with the completion of the Phase 3.3 repository-pattern migration: raw SQL scattered across 14+ source modules is extracted into a clean src/db/repository/ directory of 9 domain files, routed through an updated src/db.js barrel.

Key changes:

  • README.md — problem statement reframed around reducing review rounds rather than saving tokens; "Why it matters" table reordered to lead with code-review impact
  • docs/roadmap/ROADMAP.md — sections 3.1, 3.2, 3.3 marked ✅ with PR references and updated directory trees
  • src/db/repository.js deleted; its three functions migrated into src/db/repository/nodes.js
  • 9 new repository files created (nodes, edges, cfg, build-stmts, cochange, complexity, dataflow, embeddings, graph-read) exporting ~40 named functions
  • builder.js, cfg.js, complexity.js, queries.js, embedder.js, sequence.js, structure.js, watcher.js, communities.js, dataflow.js, ast.js, ast-analysis/engine.js — inline db.prepare() calls replaced with repository function imports
  • purgeFilesFromGraph in builder.js collapsed to a single purgeFilesData(db, files, options) delegation; the new implementation correctly hoists statement preparation outside the per-file loop
  • explainFunctionImpl in queries.js gains a correct Set-based deduplication for test files (replacing SELECT DISTINCT n.file)
  • tests/unit/repository.test.js import path updated to new location

Concern to address: The repository introduces an inconsistency in statement caching. getClassHierarchy (called in BFS loops) already uses a WeakMap to prepare its statement once per db instance. However, the highest-traffic build-path functions — getNodeId, getFunctionNodeId, bulkNodeIdsByFile (called per node in builder.js, structure.js, watcher.js) and deleteCfgForNode (called per function in buildCFGData) — call db.prepare() on every invocation, replacing the old prepare-once pattern. Applying the same WeakMap caching to these functions would be consistent with the established pattern and restore build-time performance.

Confidence Score: 4/5

  • Safe to merge; all behavior is preserved and the repository layer is a clean architectural improvement, with a minor performance inconsistency to address in a follow-up.
  • The SQL migration is mechanical and verified by the unchanged test suite. Every call site that formerly used an inline prepared statement has been correctly replaced with an equivalent repository function. The one concern — db.prepare() called per invocation instead of once in build-time loops — is a performance regression, not a correctness bug. It mirrors an inconsistency already flagged by the presence of the WeakMap pattern in getClassHierarchy.
  • Pay close attention to src/db/repository/nodes.js (getNodeId / getFunctionNodeId / bulkNodeIdsByFile) and src/db/repository/cfg.js (deleteCfgForNode) — these are the build-loop hotpaths where the prepare-once optimisation should be restored.

Important Files Changed

Filename Overview
src/db/repository/nodes.js New file consolidating all node lookups; functions getNodeId, getFunctionNodeId, and bulkNodeIdsByFile call db.prepare() on every invocation — these are called in build-time loops in builder.js, structure.js, and watcher.js, departing from the prepare-once pattern that was used before.
src/db/repository/edges.js New file with all edge query functions; getClassHierarchy correctly uses a WeakMap statement cache, but other frequently-called functions (findCallees, findCallers, findDistinctCallers, etc.) re-prepare on every call. Called in BFS loops in queries.js and sequence.js.
src/db/repository/cfg.js New file with CFG repository functions; deleteCfgForNode calls db.prepare() twice on each invocation and is called inside the buildCFGData transaction loop over all functions per file.
src/db/repository/build-stmts.js New file implementing purgeFileData / purgeFilesData; purgeFilesData correctly prepares statements once before the loop. purgeFileData (single-file) re-runs preparePurgeStmts on each call, which is fine for its infrequent use.
src/db/repository/index.js Clean barrel re-export aggregating all repository sub-modules. All new functions added to db.js are correctly re-exported here.
src/db.js Updated barrel: export target changed from ./db/repository.js to ./db/repository/index.js, 30+ new repository functions exported. Existing exports preserved.
src/builder.js purgeFilesFromGraph body replaced with purgeFilesData call; getNodeId and bulkGetNodeIds replaced with wrapper objects that delegate to repository functions. Adapter pattern preserves call-site interface but each get() call now invokes db.prepare() internally.
src/queries.js Large refactor replacing ~20 inline SQL blocks with repository function calls. getClassHierarchy removed (moved to edges.js). The relatedTests deduplication in explainFunctionImpl now correctly uses a Set instead of SELECT DISTINCT on file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph "Before (scattered SQL)"
        B1["builder.js\n(inline db.prepare)"]
        B2["cfg.js\n(inline db.prepare)"]
        B3["complexity.js\n(inline db.prepare)"]
        B4["queries.js\n(inline db.prepare)"]
        B5["embedder.js\n(inline db.prepare)"]
        B6["...12 more modules"]
    end

    subgraph "After (repository layer)"
        DB["src/db.js\n(barrel re-export)"]
        IDX["repository/index.js\n(barrel re-export)"]
        N["repository/nodes.js\ngetNodeId · getFunctionNodeId\nbulkNodeIdsByFile · findNode*"]
        E["repository/edges.js\nfindCallees · findCallers\nfindImport* · getClassHierarchy"]
        C["repository/cfg.js\ngetCfgBlocks · getCfgEdges\ndeleteCfgForNode · hasCfgTables"]
        BS["repository/build-stmts.js\npurgeFileData · purgeFilesData"]
        GR["repository/graph-read.js\ngetCallableNodes · getCallEdges\ngetFileNodesAll · getImportEdges"]
        OT["repository/\ncochange · complexity\ndataflow · embeddings"]
    end

    B1 & B2 & B3 & B4 & B5 & B6 -->|"now imports from"| DB
    DB --> IDX
    IDX --> N & E & C & BS & GR & OT
Loading

Last reviewed commit: 795a26c

@github-actions github-actions bot locked and limited conversation to collaborators Mar 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant