Skip to content

feat: parser refactoring, Jedi call resolution, and performance optimizations#158

Open
gzenz wants to merge 1 commit intotirth8205:mainfrom
gzenz:feat/lang-handlers-jedi-resolver-and-perf
Open

feat: parser refactoring, Jedi call resolution, and performance optimizations#158
gzenz wants to merge 1 commit intotirth8205:mainfrom
gzenz:feat/lang-handlers-jedi-resolver-and-perf

Conversation

@gzenz
Copy link
Copy Markdown
Contributor

@gzenz gzenz commented Apr 8, 2026

Summary

Major improvements spanning parser architecture, call graph accuracy, build performance, and dead code detection:

  • Parser refactoring: Extracted 16 per-language handler modules into code_review_graph/lang/ using a strategy pattern, replacing monolithic conditionals in parser.py. Thread-safe parser caches with double-check locking.
  • Jedi-based call resolution: New jedi_resolver.py resolves Python method calls at build time. Pre-scan filtering by project function names reduces enrichment from 36s to 3s on large repos. New [enrichment] optional dependency group.
  • PreToolUse search enrichment: New enrich.py module and code-review-graph enrich CLI command inject graph context (callers, flows, community, tests) into agent search results passively via hook.
  • Call graph improvements: Typed variable call enrichment (Python, JS/TS, Kotlin/Java), star import resolution, namespace imports, CommonJS require(), Angular template parsing, JSX handler tracking, module-qualified call resolution, function/class references as arguments.
  • Dead code FP reduction: Framework decorators recognized as entry points, CDK construct methods and abstract overrides excluded, e2e test directories filtered.
  • Community detection 21x speedup: Bulk node loading + adjacency-indexed cohesion computation (48.6s to 2.3s on 41k-node repos).
  • Build performance: Batch file storage (50-file transactions), batch risk_index (2 GROUP BY queries replace per-node loops).
  • DB schema v8: Composite edge index for upsert performance (v7 reserved by PR fix: add sqlite edge compound indexes #127).
  • Other: Weighted flow risk scoring, transitive TESTED_BY, --quiet/--json CLI flags, search deduplication, 829+ tests (up from 615).

Evaluation

Tested against Gadgetbridge (41k nodes, 280k edges):

  • 8/10 scorecard PASS (callers_of, callees_of, tests_for, communities, flows, impact radius, risk scores)
  • Call resolution rate improved from 28% to 39.6%
  • Community detection: 48.6s to 2.3s
  • Full build time: ~22s for 3,574 files

Migration note

Our composite edge index migration is numbered v8 to avoid conflict with v6 (summary tables, already on main) and v7 (reserved by PR #127).

Test plan

  • uv run pytest tests/ --tb=short -q -- 829 passed, 4 skipped
  • uv run ruff check code_review_graph/ -- all checks passed
  • Full rebuild + evaluation on Gadgetbridge (41k nodes)
  • Full rebuild + evaluation on internal Python/TS/React project (261 files)
  • CI pipeline (lint, type-check, security, test matrix)

…izations

Major improvements to code-review-graph spanning parser architecture,
call graph accuracy, and build performance.

Parser refactoring:
- Extract 16 per-language handler modules into code_review_graph/lang/
  using a strategy pattern, replacing monolithic conditionals in parser.py
- Thread-safe parser caches with double-check locking

Call graph enrichment:
- Jedi-based Python method call resolution at build time (jedi_resolver.py)
- Pre-scan filtering by project function names (36s to 3s on large repos)
- Typed variable call enrichment (Python, JS/TS, Kotlin/Java)
- Star import resolution, namespace imports, CommonJS require()
- Angular template parsing, JSX handler tracking
- Module-level import tracking and module-qualified call resolution
- Function/class references passed as call arguments

PreToolUse search enrichment:
- New enrich.py module and code-review-graph enrich CLI command
- Injects graph context (callers, flows, community, tests) into agent
  search results passively via hook

Dead code false positive reduction:
- Framework decorators recognized as entry points
- CDK construct methods, abstract overrides excluded
- E2e test directories excluded from dead code detection

Performance:
- Community detection: 48.6s to 2.3s (21x speedup) via bulk node
  loading and adjacency-indexed cohesion computation
- Jedi enrichment: 36s to 3s (12x) via pre-scan filtering
- Batch file storage (50-file transactions)
- Batch risk_index (2 GROUP BY queries replace per-node loops)

Other:
- Weighted flow risk scoring by criticality
- Transitive TESTED_BY lookup for tests_for and risk scoring
- DB schema v8: composite edge index (v7 reserved by PR tirth8205#127)
- --quiet and --json CLI flags
- Search query deduplication, test function deprioritization
- New [enrichment] optional dependency group for Jedi
- 829+ tests across 26 test files (up from 615)

Evaluated against Gadgetbridge (41k nodes, 280k edges): 8/10 PASS,
call resolution rate improved from 28% to 39.6%.
@gzenz gzenz force-pushed the feat/lang-handlers-jedi-resolver-and-perf branch from 6df1421 to 702ac5b Compare April 8, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant