fix(git): cap and stream untracked-file line counting#2218
Merged
Conversation
`countFileLines` previously read each untracked file's full content into memory as UTF-8 via `fs.readFile(filePath, "utf-8")`. The caller in `getChangedFilesDetailed` runs this 16-way concurrent against every untracked path returned by `streamGitStatus` (up to 50k entries). On a monorepo with multi-MB build artifacts or lockfiles, peak heap was `16 * file_bytes * 2` (V8's UTF-16 cost), easily several GB. That OOM'd the main process and froze the renderer waiting on a tRPC call that would never return — the symptom was the app appearing to freeze a few seconds after the sidebar painted, with no `[ipc-rate]` warnings, just a silent V8 `Scavenger: semi-space copy Allocation failed`. Fix: bail out on files larger than 1 MB, and stream the rest with `createReadStream`, counting `\n` bytes byte-by-byte. Per-stream memory stays at ~64 KB regardless of file size, so peak across 16 concurrent reads is ~1 MB total. Preserves the original return semantics (trailing-newline-aware line count) for files within the cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/git/src/queries.test.ts:257-268
**Prefer parameterised tests for the two line-count cases**
The team's convention is to always use parameterised tests. The two sub-cases here — file with a trailing newline and file without one — test the same behaviour (accurate line counting) with different inputs and expected values, which is exactly the pattern parameterisation is designed for. A `it.each` table would express the intent more clearly and make it trivial to add further edge-cases (e.g. an empty file, a CRLF file) without repeating the repo-setup boilerplate.
Reviews (1): Last reviewed commit: "fix(git): cap and stream untracked-file ..." | Re-trigger Greptile |
Earlier on this PR I gated countFileLines with a 1 MB size cap that returned `linesAdded: 0` for anything bigger. That was a correctness regression — a 5 MB untracked lockfile would show 0 lines in the diff stats instead of its real count. The streaming line counter is already enough to fix the OOM on its own: `createReadStream`'s 64 KB highWaterMark means peak per-stream memory is ~64 KB regardless of file size, so 16-way concurrent reads top out around 1 MB total whether the files are 5 KB or 5 GB. The only cost of removing the cap is wall-clock time for genuinely huge files (a 1 GB untracked file now streams in a few seconds instead of returning 0), which is the correct trade. Updated the regression test to assert the real line count for a ~1.6 MB untracked file instead of asserting it gets skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pauldambra
commented
May 19, 2026
Member
Author
pauldambra
left a comment
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
Multi-perspective review by qa-team (reliability / performance / data-integrity / compatibility / two generalists), paul-reviewer, and xp-reviewer. All three reviewers converged on the same two MEDIUM findings, posted inline. Six lower-severity findings (Buffer.indexOf perf, dead string-chunk branch, symlink-follow risk, extra fs.stat, lastByte dead branch, edge-case test coverage, magic-number extraction) were noted but are being addressed silently in the next commit.
Verdict: 💬 APPROVE WITH NITS — no blocking findings.
Convergent feedback from three reviewers on the streaming line counter:
- thread `AbortSignal` from `getChangedFilesDetailed` through
`countFileLines` into `createReadStream({ signal })` and check the
signal between items in `mapWithConcurrency`. A cancelled tRPC call
no longer streams every untracked file to completion before the
result is discarded.
- log on `stream.on("error", ...)` via `console.warn` (matches the
pattern already used in `utils.ts`). Silent zeros in a diff-stats
field is the exact smell that hid the original OOM bug; now there's
a breadcrumb when it happens.
- replace the per-byte JS scan with `Buffer.indexOf(0x0a)` — native, an
order of magnitude faster on multi-MB buffers, which is the exact
workload this function exists to handle. (The original
`typeof chunk === "string"` branch was dead at runtime — Node's
`data` listener types are wider than the actual emit — so it's now
a focused cast with a comment explaining why.)
- switch `fs.stat` -> `fs.lstat` so an untracked symlink to /dev/zero
(or any device / dir / fifo through a symlink) short-circuits to 0
via `isFile()` instead of streaming forever. Symlinks in the
untracked list are rare; preferring safety over a legitimate-symlink
line count is the right trade.
- comment the `lastByte === -1` branch so a future cleanup doesn't
treat it as dead — it's the TOCTOU guard.
Test coverage expanded to a parameterized `it.each` over the line-count
contract (trailing newline, no trailing newline, single byte, lone
newline, consecutive newlines, CRLF), plus an explicit empty-file case.
The multi-MB regression test now uses a named constant so a future
"shrink for speed" PR can't accidentally drop it below the stream's
highWaterMark and silently downgrade the test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jonathanlab
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes an OOM in the main process that freezes the app a few seconds after the sidebar paints, on accounts whose selected directory is a monorepo with large untracked files (lockfiles, build artifacts, source maps).
countFileLineswas reading each file's full contents into V8 memory viafs.readFile, run 16-way concurrent against tens of thousands of paths — a single big file in any of the 16 in-flight reads could push the heap past 4 GB and crash main. Now we stream-count\nbytes withcreateReadStream, so peak per-stream memory is ~64 KB regardless of file size and the 16-way batch stays bounded around 1 MB total.Problem
countFileLinesinpackages/git/src/queries.tswas reading each untracked file fully into memory viafs.readFile(filePath, "utf-8"), and the caller (getChangedFilesDetailed) runs it 16-way concurrent against up to 50k untracked paths fromstreamGitStatus.16 × file_bytes × 2(V8's UTF-16 cost). That blew past Electron's 4 GB main-process heap on real-world repos and OOM'd the main process. The renderer appeared frozen ~2–3 s after the sidebar painted because all its in-flight tRPC calls were waiting on a dead main. Symptom in chromium.log was a silentOOM error in V8: Scavenger: semi-space copy Allocation failed, with no[ipc-rate]warning before it (no IPC storm — just main quietly running out of heap).streamGitStatus. Bisected on a freshly-onboarded data dir pointed at this monorepo.Fix
createReadStream(64 KB highWaterMark) and count\nbytes byte-by-byte. Per-stream memory stays at ~64 KB regardless of file size, so peak across 16 concurrent reads is ~1 MB total — well under any heap pressure threshold.linesAdded: 0for any larger file. That's a correctness regression and the streaming alone is already enough to fix the OOM, so the cap was dropped in the follow-up commit.)Test plan
pnpm --filter @posthog/git test src/queries.test.ts— newgetChangedFilesDetailedtests pass (small-file accuracy + correct counting of a multi-MB untracked file that previously OOM'd)pnpm --filter code typecheckpnpm devagainst a fresh data dir pointed at this monorepo — previously OOM'd within ~3 s ofMainSidebarpaint, now stays responsive🤖 Generated with Claude Code