Skip to content

fix: resilient edge batch upsert for large codebases#65

Open
minkymorgan wants to merge 2 commits intoJakedismo:mainfrom
minkymorgan:fix/resilient-edge-batch-upsert
Open

fix: resilient edge batch upsert for large codebases#65
minkymorgan wants to merge 2 commits intoJakedismo:mainfrom
minkymorgan:fix/resilient-edge-batch-upsert

Conversation

@minkymorgan
Copy link

Problem

upsert_edges_batch sends all edges in a single SurrealDB FOR $doc IN $batch query. For large codebases — we tested against a 1224-file Rust workspace generating ~107,000 edges — the WebSocket connection resets mid-write. The indexer logs an error and continues, leaving 0 edges in the database (see #51).

The same root cause likely affects any project large enough to generate thousands of relationships in a single batch.

Fix

Add upsert_edges_batch_resilient to SurrealDbStorage, modelled directly on the existing upsert_chunk_embeddings_resilient pattern already in the codebase:

  • EDGE_BATCH_CHUNK_SIZE = 2_000 — pre-chunks the input before sending
  • On connection reset or excessive computation depth errors, halves the chunk and retries (iterative queue, up to 3 retries — avoids recursive async futures)
  • indexer.rs and the public add_code_edges API both route through the resilient path

The diff is 56 lines: the constant, the new function, and two one-line callsite changes.

Testing

We used CodeGraph to fix CodeGraph — indexed the codegraph-rust source itself first (dogfood), then ran the stress test:

Codebase Files Result before Result after
codegraph-graph + codegraph-mcp (self) 12 ✅ working ✅ 450+655 nodes, 165+502 edges
aslanDB (qdrant fork, 1224 .rs files) 1224 ❌ 11 edges (connection reset) ✅ 66,325 nodes, 108,409 edges

Related issues

Likely closes #51 (go index, 0 edges persisted despite 2119 extracted)

Add `upsert_edges_batch_resilient` to `SurrealDbStorage`, modelled on the
existing `upsert_chunk_embeddings_resilient` pattern.

Problem: `upsert_edges_batch` sends all edges in a single SurrealDB
`FOR $doc IN $batch` query. For large codebases (e.g. a 1224-file Rust
workspace generating ~107K edges) this overwhelms the WebSocket connection
with a connection reset, leaving 0 edges persisted while the indexer
silently continues. Issue Jakedismo#51 (go index has no edge) is likely the same
root cause.

Fix:
- Add `EDGE_BATCH_CHUNK_SIZE = 2_000` constant
- `upsert_edges_batch_resilient` pre-chunks the input at this size
- On connection reset or excessive computation depth errors, halves the
  chunk and retries up to 3 times (iterative queue, no recursive futures)
- Wire `indexer.rs` to call `upsert_edges_batch_resilient`
- Wire `add_code_edges` (public API) to use the resilient path

Tested by indexing the codegraph-rust source itself (dogfood: 450+655
nodes, 165+502 edges, all persisted cleanly), and stress-tested on a
1224-file Rust workspace (aslanDB/qdrant fork, ~107K edges) which
previously failed with connection reset.

Closes Jakedismo#51
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b8440c039

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

let msg = err.to_string();
let recoverable = msg.contains("excessive computation depth")
|| msg.contains("ComputationDepth")
|| msg.contains("connection reset");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize recoverable error matching before contains checks

The resilient edge path can still fail without retrying on common socket-reset errors because contains("connection reset") is case-sensitive, while Rust/WS errors are often rendered as "Connection reset ...". In that case recoverable becomes false and the function returns immediately instead of splitting/retrying, which defeats the main fix for large-batch edge writes in exactly the connection-reset scenario this commit targets.

Useful? React with 👍 / 👎.

Per PR review by chatgpt-codex-connector: error strings from Rust/WS
can arrive as 'Connection reset ...' (capital C) but the previous
contains() checks were case-sensitive, causing silent retry failures
in the exact scenario the resilient paths are designed to handle.

Fix: use msg.to_lowercase() before all contains() checks in both
upsert_edges_batch_resilient and upsert_chunk_embeddings_resilient.
Also add 'broken pipe' as a recoverable error variant.
@minkymorgan
Copy link
Author

Good catch — addressed in b6d0b9b.

Both upsert_edges_batch_resilient and upsert_chunk_embeddings_resilient now use msg.to_lowercase() before all contains() checks, so capital-C Connection reset and any other capitalisation variant are caught correctly. Also added broken pipe as a recoverable error alongside connection reset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

go index has no edge

1 participant