feat: qualified names, scope, and visibility for node model (3.12)#437
feat: qualified names, scope, and visibility for node model (3.12)#437carlos-alm wants to merge 6 commits intomainfrom
Conversation
…ADMAP 3.12) Enrich the node model with hierarchical scoping information so queries like "all methods of class X" work without edge traversal. Schema: migration v15 adds qualified_name, scope, visibility columns with indexes. All 8 language extractors now emit visibility metadata (JS/TS modifiers, Java/C#/PHP AST nodes, Python/Go/Rust conventions). New repository functions: findNodesByScope(), findNodeByQualifiedName(). childrenData() now returns qualifiedName, scope, visibility for all nodes. Also updates ROADMAP items 3.8, 3.9, 3.11, 3.12 to reflect completed status. Impact: 33 functions changed, 35 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Qualified Names, Scope, and Visibility (3.12) ⭐⭐⭐⭐☆Review Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: This PR implements ROADMAP item 3.12, adding hierarchical scoping and qualified name resolution to the node model. The current system requires edge traversal to query "all methods of class X" or resolve qualified names like "DateHelper.format". Why This Approach: The PR adds three new columns ( Root Cause Justification: This is a planned architectural enhancement, not a bug fix. The approach is sound - adding indexed columns for direct lookup is more efficient than repeated graph traversal. Phase 2: Configuration Change Detection ✅No HIGH-RISK changes detected:
The only configuration file changes are additions to the ROADMAP.md marking items as completed. Phase 3: Test Integrity ✅No test weakening detected:
Test Quality: Tests are well-structured with realistic fixtures and comprehensive coverage of the new functionality. Phase 4: Code Quality & Security ✅Database Security:
Code Quality:
Implementation Quality at src/builder/stages/insert-nodes.js:58-72:
Phase 5: Backlog Compliance ✅
Critical ConcernsNone identified. This is a well-executed implementation of a planned architectural enhancement. Final Recommendation
|
Greptile SummaryThis PR completes milestone 3.12 by enriching the node model with Changes of note:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Extractor as Language Extractor<br/>(js/ts/java/cs/php/py/go/rs)
participant InsertNodes as insert-nodes.js
participant DB as SQLite nodes table
participant Repo as Repository<br/>(nodes.js)
participant API as childrenData /<br/>findNodesByScope /<br/>findNodeByQualifiedName
Extractor->>InsertNodes: def { name, kind, line, visibility }
Note over InsertNodes: Phase 1: qualified_name = def.name<br/>scope = def.name.slice(0, lastIndexOf('.'))
InsertNodes->>DB: INSERT OR IGNORE (name, kind, file, line,<br/>end_line, parent_id, qualified_name, scope, visibility)
Extractor->>InsertNodes: child { name, kind, line, visibility }
Note over InsertNodes: Phase 3: qualified_name = def.name + '.' + child.name<br/>scope = def.name
InsertNodes->>DB: INSERT OR IGNORE (child row with parent_id)
Note over DB: Migration v15 backfills<br/>qualified_name = name for pre-existing rows
API->>Repo: findNodesByScope(db, 'DateHelper', { kind: 'method' })
Repo->>DB: SELECT * WHERE scope = ? AND kind = ?
DB-->>Repo: [DateHelper.format, DateHelper.constructor, ...]
Repo-->>API: node[]
API->>Repo: findNodeByQualifiedName(db, 'DateHelper.format')
Repo->>DB: SELECT * WHERE qualified_name = ?
DB-->>Repo: [{ name, scope, visibility, ... }]
Repo-->>API: node[]
|
| export function pythonVisibility(name) { | ||
| if (name.startsWith('__') && !name.endsWith('__')) return 'private'; | ||
| if (name.startsWith('_')) return 'protected'; | ||
| return undefined; | ||
| } |
There was a problem hiding this comment.
Dunder methods incorrectly classified as 'protected'
pythonVisibility falls through to the startsWith('_') guard for all dunder methods (e.g. __init__, __str__, __len__, __repr__, __eq__). Because '__init__'.startsWith('__') && !'__init__'.endsWith('__') evaluates to false, the first branch is skipped, and the second branch '__init__'.startsWith('_') returns true → 'protected'.
Dunder methods are publicly callable by the Python runtime and by external code (e.g. str(obj) calls __str__, len(obj) calls __len__). Marking them 'protected' misrepresents their actual accessibility and will produce wrong visibility data for every Python class in the graph.
The missing guard is an explicit dunder (double-underscore on both sides) early-exit returning undefined:
| export function pythonVisibility(name) { | |
| if (name.startsWith('__') && !name.endsWith('__')) return 'private'; | |
| if (name.startsWith('_')) return 'protected'; | |
| return undefined; | |
| } | |
| export function pythonVisibility(name) { | |
| if (name.startsWith('__') && name.endsWith('__')) return undefined; | |
| if (name.startsWith('__')) return 'private'; | |
| if (name.startsWith('_')) return 'protected'; | |
| return undefined; | |
| } |
There was a problem hiding this comment.
Fixed in 9e86b57. Added early return for dunder names (__init__, __str__, etc.) before the single-underscore check. Test added to verify dunders return undefined.
src/db/repository/nodes.js
Outdated
| const _findNodesByScopeStmt = new WeakMap(); | ||
| export function findNodesByScope(db, scopeName, opts = {}) { | ||
| let sql = 'SELECT * FROM nodes WHERE scope = ?'; | ||
| const params = [scopeName]; | ||
| if (opts.kind) { | ||
| sql += ' AND kind = ?'; | ||
| params.push(opts.kind); | ||
| } | ||
| if (opts.file) { | ||
| sql += ' AND file LIKE ?'; | ||
| params.push(`%${opts.file}%`); | ||
| } | ||
| sql += ' ORDER BY file, line'; | ||
| return db.prepare(sql).all(...params); | ||
| } |
There was a problem hiding this comment.
_findNodesByScopeStmt WeakMap is declared but never used
_findNodesByScopeStmt is declared on line 262 following the same caching pattern as every other function in this file, but findNodesByScope calls db.prepare(sql).all(...params) directly on every invocation — the WeakMap is never passed to cachedStmt.
Because the SQL shape varies with opts.kind and opts.file, a single cached statement doesn't work here. The dead declaration implies the caching was intended but not wired up, which is misleading. Either remove the WeakMap or consider caching the common (no-opts) variant:
| const _findNodesByScopeStmt = new WeakMap(); | |
| export function findNodesByScope(db, scopeName, opts = {}) { | |
| let sql = 'SELECT * FROM nodes WHERE scope = ?'; | |
| const params = [scopeName]; | |
| if (opts.kind) { | |
| sql += ' AND kind = ?'; | |
| params.push(opts.kind); | |
| } | |
| if (opts.file) { | |
| sql += ' AND file LIKE ?'; | |
| params.push(`%${opts.file}%`); | |
| } | |
| sql += ' ORDER BY file, line'; | |
| return db.prepare(sql).all(...params); | |
| } | |
| export function findNodesByScope(db, scopeName, opts = {}) { | |
| let sql = 'SELECT * FROM nodes WHERE scope = ?'; | |
| const params = [scopeName]; | |
| if (opts.kind) { | |
| sql += ' AND kind = ?'; | |
| params.push(opts.kind); | |
| } | |
| if (opts.file) { | |
| sql += ' AND file LIKE ?'; | |
| params.push(`%${opts.file}%`); | |
| } | |
| sql += ' ORDER BY file, line'; | |
| return db.prepare(sql).all(...params); | |
| } |
There was a problem hiding this comment.
Fixed in 9e86b57. Removed the dead _findNodesByScopeStmt WeakMap. findNodesByScope uses dynamic SQL (kind/file filters are optional) so it correctly uses db.prepare() directly.
src/db/repository/nodes.js
Outdated
| const _findNodeByQualifiedNameStmt = new WeakMap(); | ||
| export function findNodeByQualifiedName(db, qualifiedName) { | ||
| return cachedStmt( | ||
| _findNodeByQualifiedNameStmt, | ||
| db, | ||
| 'SELECT * FROM nodes WHERE qualified_name = ? LIMIT 1', | ||
| ).get(qualifiedName); | ||
| } |
There was a problem hiding this comment.
LIMIT 1 silently hides ambiguous qualified names across files
qualified_name has an index but no UNIQUE constraint, so the same value (e.g. DateHelper.format) can legitimately exist for identically-named classes in different files. LIMIT 1 will silently return whichever row SQLite encounters first with no indication that a second match was discarded.
This matters because the PR description presents findNodeByQualifiedName as a "direct lookup without edge traversal". Callers have no signal that their result may be ambiguous. Consider either:
- Adding an optional
fileparam (to narrow the result when the caller knows the file), or - Returning all matches (an array) and letting callers decide, or
- Documenting the
LIMIT 1behaviour explicitly in the JSDoc so callers are aware.
At minimum the JSDoc should warn that results are unspecified if qualified_name is not unique across the database.
There was a problem hiding this comment.
Fixed in 9e86b57. findNodeByQualifiedName now returns an array of all matches instead of LIMIT 1. Added optional opts.file parameter to narrow results when needed. Tests updated accordingly.
- Fix pythonVisibility dunder check ordering (__init__ no longer misclassified) - Change findNodeByQualifiedName to return array with optional file filter - Remove dead _findNodesByScopeStmt WeakMap, restore _findNodeByQualifiedNameStmt - Update test to match new findNodeByQualifiedName return type Impact: 2 functions changed, 4 affected
| export function findNodesByScope(db, scopeName, opts = {}) { | ||
| let sql = 'SELECT * FROM nodes WHERE scope = ?'; | ||
| const params = [scopeName]; | ||
| if (opts.kind) { | ||
| sql += ' AND kind = ?'; | ||
| params.push(opts.kind); | ||
| } | ||
| if (opts.file) { | ||
| sql += ' AND file LIKE ?'; | ||
| params.push(`%${opts.file}%`); | ||
| } | ||
| sql += ' ORDER BY file, line'; | ||
| return db.prepare(sql).all(...params); | ||
| } |
There was a problem hiding this comment.
opts.file LIKE pattern does not escape SQL wildcards
Both findNodesByScope and findNodeByQualifiedName construct the file filter as `%${opts.file}%`. Any _ or % character that legitimately appears in the file path is treated as a LIKE wildcard, which can produce false matches.
This is particularly likely in practice — underscores are common in file names (e.g. src/date_helper.js would also match src/dateXhelper.js). % is rarer but possible in URL-encoded paths.
The same pattern appears in findNodeByQualifiedName (line 292) and should be addressed in both places. A minimal fix:
// Escape LIKE wildcards in a file path segment
function escapeLike(s) {
return s.replace(/[%_\\]/g, '\\$&');
}Then use %${escapeLike(opts.file)}% with ESCAPE '\\' appended to each LIKE clause.
There was a problem hiding this comment.
Fixed in d34915e. Added escapeLike() helper that escapes %, _, and backslash characters. Both findNodesByScope and findNodeByQualifiedName now use escapeLike(opts.file) with ESCAPE clause in their LIKE patterns.
…and findNodeByQualifiedName
| if (modifierTypes.has(child.type)) { | ||
| const text = child.text; | ||
| if (VISIBILITY_KEYWORDS.has(text)) return text; | ||
| // Compound modifiers node (Java: "public static") — scan its text for a keyword | ||
| for (const kw of VISIBILITY_KEYWORDS) { | ||
| if (text.includes(kw)) return kw; | ||
| } |
There was a problem hiding this comment.
C# private protected compound modifier returns wrong visibility
extractModifierVisibility checks modifiers with text.includes(kw), iterating keywords in Set insertion order: 'public', 'private', 'protected'. For the valid C# 7.2 compound access modifier private protected, the modifiers/visibility_modifier node text will be "private protected":
VISIBILITY_KEYWORDS.has("private protected")→false(not an exact match)"private protected".includes('public')→false"private protected".includes('private')→true→ returns'private'
This is incorrect. private protected grants access to derived classes within the same assembly, making it significantly more permissive than private. Returning 'private' here would cause the graph to mark those members as strictly private, hiding the protected-derived-class relationship.
A simple fix is to guard against the compound case before the generic substring scan:
if (modifierTypes.has(child.type)) {
const text = child.text;
if (VISIBILITY_KEYWORDS.has(text)) return text;
// Handle C# 'private protected' — treat as protected (accessible to derived types)
if (text === 'private protected') return 'protected';
// Compound modifiers node (Java: "public static") — scan its text for a keyword
for (const kw of VISIBILITY_KEYWORDS) {
if (text.includes(kw)) return kw;
}
}There was a problem hiding this comment.
Fixed in d34915e. Added explicit guard for 'private protected' before the generic substring scan — returns 'protected' since it grants access to derived types within the same assembly.
…e JSDoc - Add escapeLike() helper to escape SQL LIKE wildcards (%, _, \) in opts.file parameters for findNodesByScope and findNodeByQualifiedName - Guard C# 'private protected' compound modifier before substring scan so it returns 'protected' instead of incorrectly matching 'private' - Update batchInsertNodes JSDoc to reflect the 9-column row format Impact: 4 functions changed, 11 affected
src/db/repository/nodes.js
Outdated
| function escapeLike(s) { | ||
| return s.replace(/[%_\\]/g, '\\$&'); | ||
| } |
There was a problem hiding this comment.
Duplicate escapeLike function — SyntaxError in ESM
escapeLike is declared twice in the same module scope: once at line 254 (directly after findNodeChildren) and again here at line 268 (inside the JSDoc block for findNodesByScope). JavaScript ES modules always run in strict mode, and duplicate function declarations in strict mode are a static SyntaxError — the module will fail to parse and any import of nodes.js (including src/db.js and all callers) will throw at startup, breaking the entire application.
The second declaration is a copy-paste artefact from the earlier escapeLike fix: the first copy was added to address the escaping concern, and then the JSDoc + function body for findNodesByScope was re-inserted alongside it.
Remove the duplicate at line 268–270:
| function escapeLike(s) { | |
| return s.replace(/[%_\\]/g, '\\$&'); | |
| } | |
| export function findNodesByScope(db, scopeName, opts = {}) { |
There was a problem hiding this comment.
Fixed in 65a15e9. Removed the duplicate escapeLike() declaration — copy-paste artefact from the previous fix.
src/builder/stages/insert-nodes.js
Outdated
| const dotIdx = def.name.indexOf('.'); | ||
| const scope = dotIdx !== -1 ? def.name.slice(0, dotIdx) : null; |
There was a problem hiding this comment.
indexOf truncates scope for multi-segment names
def.name.indexOf('.') finds the first dot, so for any hypothetical name with more than one segment (e.g. a future extractor emitting "Outer.Inner.method") the scope would be set to "Outer" instead of "Outer.Inner". findNodesByScope(db, 'Inner') would then find nothing for such methods.
The scope should be derived from everything except the last segment, matching the caller expectation that scope is the direct parent name. Use lastIndexOf to be consistent:
| const dotIdx = def.name.indexOf('.'); | |
| const scope = dotIdx !== -1 ? def.name.slice(0, dotIdx) : null; | |
| const dotIdx = def.name.lastIndexOf('.'); | |
| const scope = dotIdx !== -1 ? def.name.slice(0, dotIdx) : null; |
There was a problem hiding this comment.
Fixed in 65a15e9. Changed indexOf to lastIndexOf so multi-segment names like Outer.Inner.method correctly derive scope as Outer.Inner instead of truncating at the first dot.
- Remove duplicate escapeLike() declaration that would cause a
SyntaxError in strict mode ESM
- Use lastIndexOf('.') instead of indexOf('.') for scope derivation
so multi-segment names like "Outer.Inner.method" correctly produce
scope "Outer.Inner" instead of truncating to "Outer"
Impact: 1 functions changed, 0 affected
| { | ||
| version: 15, | ||
| up: ` | ||
| ALTER TABLE nodes ADD COLUMN qualified_name TEXT; | ||
| ALTER TABLE nodes ADD COLUMN scope TEXT; | ||
| ALTER TABLE nodes ADD COLUMN visibility TEXT; | ||
| CREATE INDEX IF NOT EXISTS idx_nodes_qualified_name ON nodes(qualified_name); | ||
| CREATE INDEX IF NOT EXISTS idx_nodes_scope ON nodes(scope); | ||
| `, | ||
| }, |
There was a problem hiding this comment.
Migration doesn't backfill qualified_name for existing databases
Migration v15 adds qualified_name, scope, and visibility columns as nullable, but existing rows get NULL for all three. Because findNodeByQualifiedName queries the qualified_name column directly (not name), any existing database upgraded from ≤ v14 will silently return zero results from the new lookup functions until the user runs a full rebuild — even though every node already has a valid name that is identical to what qualified_name would hold.
qualified_name is always equal to name at insert time (def.name is used verbatim), so this can be backfilled cheaply and safely in the migration itself:
ALTER TABLE nodes ADD COLUMN qualified_name TEXT;
ALTER TABLE nodes ADD COLUMN scope TEXT;
ALTER TABLE nodes ADD COLUMN visibility TEXT;
-- Backfill qualified_name from name (always identical at insert time)
UPDATE nodes SET qualified_name = name WHERE qualified_name IS NULL;
-- Backfill scope from name for dotted entries (extracts 'A.B' → 'A' for single-level names)
-- Full scope accuracy requires a rebuild, but the UPDATE covers the common case.
CREATE INDEX IF NOT EXISTS idx_nodes_qualified_name ON nodes(qualified_name);
CREATE INDEX IF NOT EXISTS idx_nodes_scope ON nodes(scope);The same UPDATE nodes SET qualified_name = name should also be added to the initSchema fallback block (around line 323) so databases that are patched via the ALTER TABLE path also get backfilled.
There was a problem hiding this comment.
Fixed in 63afc1a. Migration v15 now runs UPDATE nodes SET qualified_name = name WHERE qualified_name IS NULL after adding the columns. The initSchema fallback path also backfills so databases patched via ALTER TABLE get the same treatment.
Existing databases upgraded from ≤v14 would have NULL qualified_name for all rows, causing findNodeByQualifiedName to return zero results until a full rebuild. Backfill in both migration v15 and initSchema fallback path. Impact: 1 functions changed, 0 affected
Summary
qualified_name,scope,visibilitycolumns tonodestable with indexesaccessibility_modifier+#private fields), Java/C#/PHP (modifier AST nodes), Python (naming convention), Go (capitalization), Rust (pubkeyword)findNodesByScope(db, scopeName, opts)for "all methods of class X" queries,findNodeByQualifiedName(db, qualifiedName)for direct lookup without edge traversalqualifiedName,scope,visibilityfor parent nodes and childrenTest plan
tests/integration/qualified-names.test.js— 8 tests covering qualified_name lookup, scope queries, visibility extraction (WASM engine), and childrenData outputcodegraph build+codegraph children NodeQuery --jsonon the codegraph codebase itself