fix(git): cap and stream untracked-file line counting by pauldambra · Pull Request #2218 · PostHog/code

pauldambra · 2026-05-19T09:58:34Z

Fixes an OOM in the main process that freezes the app a few seconds after the sidebar paints, on accounts whose selected directory is a monorepo with large untracked files (lockfiles, build artifacts, source maps). countFileLines was reading each file's full contents into V8 memory via fs.readFile, run 16-way concurrent against tens of thousands of paths — a single big file in any of the 16 in-flight reads could push the heap past 4 GB and crash main. Now we stream-count \n bytes with createReadStream, so peak per-stream memory is ~64 KB regardless of file size and the 16-way batch stays bounded around 1 MB total.

Problem

countFileLines in packages/git/src/queries.ts was reading each untracked file fully into memory via fs.readFile(filePath, "utf-8"), and the caller (getChangedFilesDetailed) runs it 16-way concurrent against up to 50k untracked paths from streamGitStatus.
On a monorepo with multi-MB build artifacts, lockfiles, or other large untracked files, peak heap was 16 × file_bytes × 2 (V8's UTF-16 cost). That blew past Electron's 4 GB main-process heap on real-world repos and OOM'd the main process. The renderer appeared frozen ~2–3 s after the sidebar painted because all its in-flight tRPC calls were waiting on a dead main. Symptom in chromium.log was a silent OOM error in V8: Scavenger: semi-space copy Allocation failed, with no [ipc-rate] warning before it (no IPC storm — just main quietly running out of heap).
Regression introduced by feat: virtualized and optimized diff rendering #2113 (c617988), which removed the previous 10k-file cap and switched to concurrent reads when migrating to streamGitStatus. Bisected on a freshly-onboarded data dir pointed at this monorepo.

Fix

Stream the file with createReadStream (64 KB highWaterMark) and count \n bytes byte-by-byte. Per-stream memory stays at ~64 KB regardless of file size, so peak across 16 concurrent reads is ~1 MB total — well under any heap pressure threshold.
Preserves the original return semantics: trailing-newline-aware count, 0 for empty / non-regular files / errors.
No artificial file-size cap — every untracked file gets an accurate line count. (An earlier version of this PR gated at 1 MB, which silently returned linesAdded: 0 for any larger file. That's a correctness regression and the streaming alone is already enough to fix the OOM, so the cap was dropped in the follow-up commit.)

Test plan

pnpm --filter @posthog/git test src/queries.test.ts — new getChangedFilesDetailed tests pass (small-file accuracy + correct counting of a multi-MB untracked file that previously OOM'd)
pnpm --filter code typecheck
pnpm dev against a fresh data dir pointed at this monorepo — previously OOM'd within ~3 s of MainSidebar paint, now stays responsive
CI

🤖 Generated with Claude Code

`countFileLines` previously read each untracked file's full content into memory as UTF-8 via `fs.readFile(filePath, "utf-8")`. The caller in `getChangedFilesDetailed` runs this 16-way concurrent against every untracked path returned by `streamGitStatus` (up to 50k entries). On a monorepo with multi-MB build artifacts or lockfiles, peak heap was `16 * file_bytes * 2` (V8's UTF-16 cost), easily several GB. That OOM'd the main process and froze the renderer waiting on a tRPC call that would never return — the symptom was the app appearing to freeze a few seconds after the sidebar painted, with no `[ipc-rate]` warnings, just a silent V8 `Scavenger: semi-space copy Allocation failed`. Fix: bail out on files larger than 1 MB, and stream the rest with `createReadStream`, counting `\n` bytes byte-by-byte. Per-stream memory stays at ~64 KB regardless of file size, so peak across 16 concurrent reads is ~1 MB total. Preserves the original return semantics (trailing-newline-aware line count) for files within the cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-05-19T10:00:29Z

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/git/src/queries.test.ts:257-268
**Prefer parameterised tests for the two line-count cases**

The team's convention is to always use parameterised tests. The two sub-cases here — file with a trailing newline and file without one — test the same behaviour (accurate line counting) with different inputs and expected values, which is exactly the pattern parameterisation is designed for. A `it.each` table would express the intent more clearly and make it trivial to add further edge-cases (e.g. an empty file, a CRLF file) without repeating the repo-setup boilerplate.

_{Reviews (1): Last reviewed commit: "fix(git): cap and stream untracked-file ..." | Re-trigger Greptile}

Earlier on this PR I gated countFileLines with a 1 MB size cap that returned `linesAdded: 0` for anything bigger. That was a correctness regression — a 5 MB untracked lockfile would show 0 lines in the diff stats instead of its real count. The streaming line counter is already enough to fix the OOM on its own: `createReadStream`'s 64 KB highWaterMark means peak per-stream memory is ~64 KB regardless of file size, so 16-way concurrent reads top out around 1 MB total whether the files are 5 KB or 5 GB. The only cost of removing the cap is wall-clock time for genuinely huge files (a 1 GB untracked file now streams in a few seconds instead of returning 0), which is the correct trade. Updated the regression test to assert the real line count for a ~1.6 MB untracked file instead of asserting it gets skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pauldambra

Note

🤖 Automated comment by QA Swarm — not written by a human

Multi-perspective review by qa-team (reliability / performance / data-integrity / compatibility / two generalists), paul-reviewer, and xp-reviewer. All three reviewers converged on the same two MEDIUM findings, posted inline. Six lower-severity findings (Buffer.indexOf perf, dead string-chunk branch, symlink-follow risk, extra fs.stat, lastByte dead branch, edge-case test coverage, magic-number extraction) were noted but are being addressed silently in the next commit.

Verdict: 💬 APPROVE WITH NITS — no blocking findings.

Convergent feedback from three reviewers on the streaming line counter: - thread `AbortSignal` from `getChangedFilesDetailed` through `countFileLines` into `createReadStream({ signal })` and check the signal between items in `mapWithConcurrency`. A cancelled tRPC call no longer streams every untracked file to completion before the result is discarded. - log on `stream.on("error", ...)` via `console.warn` (matches the pattern already used in `utils.ts`). Silent zeros in a diff-stats field is the exact smell that hid the original OOM bug; now there's a breadcrumb when it happens. - replace the per-byte JS scan with `Buffer.indexOf(0x0a)` — native, an order of magnitude faster on multi-MB buffers, which is the exact workload this function exists to handle. (The original `typeof chunk === "string"` branch was dead at runtime — Node's `data` listener types are wider than the actual emit — so it's now a focused cast with a comment explaining why.) - switch `fs.stat` -> `fs.lstat` so an untracked symlink to /dev/zero (or any device / dir / fifo through a symlink) short-circuits to 0 via `isFile()` instead of streaming forever. Symlinks in the untracked list are rare; preferring safety over a legitimate-symlink line count is the right trade. - comment the `lastByte === -1` branch so a future cleanup doesn't treat it as dead — it's the TOCTOU guard. Test coverage expanded to a parameterized `it.each` over the line-count contract (trailing newline, no trailing newline, single byte, lone newline, consecutive newlines, CRLF), plus an explicit empty-file case. The multi-MB regression test now uses a named constant so a future "shrink for speed" PR can't accidentally drop it below the stream's highWaterMark and silently downgrade the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

Comment thread packages/git/src/queries.test.ts Outdated

pauldambra commented May 19, 2026

View reviewed changes

Comment thread packages/git/src/queries.ts Outdated

Comment thread packages/git/src/queries.ts Outdated

pauldambra added the Stamphog This will request an autostamp by stamphog on small changes label May 19, 2026

jonathanlab approved these changes May 19, 2026

View reviewed changes

pauldambra requested a review from a team May 19, 2026 10:48

jonathanlab added the Create Release This will trigger a new release label May 19, 2026

pauldambra merged commit a017c55 into main May 19, 2026
15 checks passed

pauldambra deleted the fix/count-file-lines-oom branch May 19, 2026 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(git): cap and stream untracked-file line counting#2218

fix(git): cap and stream untracked-file line counting#2218
pauldambra merged 3 commits into
mainfrom
fix/count-file-lines-oom

pauldambra commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Uh oh!

pauldambra left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pauldambra commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test plan

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Uh oh!

pauldambra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pauldambra commented May 19, 2026 •

edited

Loading