fix: resilient edge batch upsert for large codebases#65
fix: resilient edge batch upsert for large codebases#65minkymorgan wants to merge 2 commits intoJakedismo:mainfrom
Conversation
Add `upsert_edges_batch_resilient` to `SurrealDbStorage`, modelled on the existing `upsert_chunk_embeddings_resilient` pattern. Problem: `upsert_edges_batch` sends all edges in a single SurrealDB `FOR $doc IN $batch` query. For large codebases (e.g. a 1224-file Rust workspace generating ~107K edges) this overwhelms the WebSocket connection with a connection reset, leaving 0 edges persisted while the indexer silently continues. Issue Jakedismo#51 (go index has no edge) is likely the same root cause. Fix: - Add `EDGE_BATCH_CHUNK_SIZE = 2_000` constant - `upsert_edges_batch_resilient` pre-chunks the input at this size - On connection reset or excessive computation depth errors, halves the chunk and retries up to 3 times (iterative queue, no recursive futures) - Wire `indexer.rs` to call `upsert_edges_batch_resilient` - Wire `add_code_edges` (public API) to use the resilient path Tested by indexing the codegraph-rust source itself (dogfood: 450+655 nodes, 165+502 edges, all persisted cleanly), and stress-tested on a 1224-file Rust workspace (aslanDB/qdrant fork, ~107K edges) which previously failed with connection reset. Closes Jakedismo#51
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b8440c039
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let msg = err.to_string(); | ||
| let recoverable = msg.contains("excessive computation depth") | ||
| || msg.contains("ComputationDepth") | ||
| || msg.contains("connection reset"); |
There was a problem hiding this comment.
Normalize recoverable error matching before contains checks
The resilient edge path can still fail without retrying on common socket-reset errors because contains("connection reset") is case-sensitive, while Rust/WS errors are often rendered as "Connection reset ...". In that case recoverable becomes false and the function returns immediately instead of splitting/retrying, which defeats the main fix for large-batch edge writes in exactly the connection-reset scenario this commit targets.
Useful? React with 👍 / 👎.
Per PR review by chatgpt-codex-connector: error strings from Rust/WS can arrive as 'Connection reset ...' (capital C) but the previous contains() checks were case-sensitive, causing silent retry failures in the exact scenario the resilient paths are designed to handle. Fix: use msg.to_lowercase() before all contains() checks in both upsert_edges_batch_resilient and upsert_chunk_embeddings_resilient. Also add 'broken pipe' as a recoverable error variant.
|
Good catch — addressed in b6d0b9b. Both |
Problem
upsert_edges_batchsends all edges in a single SurrealDBFOR $doc IN $batchquery. For large codebases — we tested against a 1224-file Rust workspace generating ~107,000 edges — the WebSocket connection resets mid-write. The indexer logs an error and continues, leaving 0 edges in the database (see #51).The same root cause likely affects any project large enough to generate thousands of relationships in a single batch.
Fix
Add
upsert_edges_batch_resilienttoSurrealDbStorage, modelled directly on the existingupsert_chunk_embeddings_resilientpattern already in the codebase:EDGE_BATCH_CHUNK_SIZE = 2_000— pre-chunks the input before sendingconnection resetorexcessive computation deptherrors, halves the chunk and retries (iterative queue, up to 3 retries — avoids recursive async futures)indexer.rsand the publicadd_code_edgesAPI both route through the resilient pathThe diff is 56 lines: the constant, the new function, and two one-line callsite changes.
Testing
We used CodeGraph to fix CodeGraph — indexed the codegraph-rust source itself first (dogfood), then ran the stress test:
Related issues
Likely closes #51 (go index, 0 edges persisted despite 2119 extracted)