src: improve compile cache performance and size by anonrig · Pull Request #63861 · nodejs/node

anonrig · 2026-06-11T22:08:52Z

Improves the on-disk compile cache (NODE_COMPILE_CACHE / module.enableCompileCache()):

Read path: read cache files with a single exactly-sized read (using the file size from fstat) instead of an exponentially growing buffer, which previously cost O(log N) syscalls/allocations and ~2N bytes of copying per file.
Size: compress the cache content on disk with zstd (level 1, prioritizing speed since persistence happens at shutdown), falling back to raw storage when not compressible. Shrinks cache directories ~2-4x and makes the crc32 integrity check cheaper since it now runs over the compressed bytes. The magic number is bumped so files in the old format are discarded as cache misses and overwritten in place.
Consume path: hand the cache to V8 through a non-owning CachedData wrapper (BufferNotOwned) instead of copying the entire buffer on every cache hit. The underlying buffer is owned by the cache entry, which outlives the synchronous compilation (same pattern as the vm cached-data path in node_contextify.cc).

Corrupted cache files keep degrading to silent cache misses and are regenerated; a corrupted size header can no longer cause an oversized allocation since the zstd frame content size is cross-checked first. Added test/parallel/test-compile-cache-corrupted.js covering bad magic, truncation, content bit-flips, and header corruption.

No public API or documented behavior changes; the file format is private to src/compile_cache.cc. Benchmark numbers (cache size and warm-startup timings) to follow in a comment.

This change was developed with AI assistance (see Co-authored-by trailer).

nodejs-github-bot · 2026-06-11T22:09:20Z

Review requested:

@nodejs/loaders
@nodejs/vm

anonrig · 2026-06-11T22:38:06Z

Verification results (macOS arm64, release build, vs. baseline at the merge-base):

Tests: all 23 parallel/test-compile-cache* pass (22 existing unchanged + the new corruption test).

On-disk size

Scenario	Baseline	This PR	Reduction
10 MB `snapshot/typescript.js` fixture (single big CJS file)	1,818,380 B	752,924 B	2.42x
300 small/medium CJS modules	274,316 B	179,915 B	1.52x

Warm-start, self-controlled (same binary, cache on vs. off, interleaved 20 runs, trimmed means — avoids binary-layout skew between builds):

Scenario	Cache benefit (baseline)	Cache benefit (this PR)
big-file	+26.4 ms	+25.0 ms
many-modules	+1.7 ms	+1.5 ms

Warm-start with a hot page cache is neutral within noise (~1 ms on the pathological single-10 MB-file case, which is dominated by one large zstd decompress; with cold page cache or slower storage the 2.4x smaller read wins). Cold-start adds one-time compression at persist (~35 ms for the 1.8 MB blob at level 1, proportionally less for typical files).

The second commit reuses zstd contexts (one ZSTD_DCtx on the handler, one ZSTD_CCtx across Persist()), which removed most of the per-file decompression overhead observed with one-shot contexts on the many-modules scenario.

Improve the compile cache by: - Reading cache files with a single exactly-sized read using the file size from fstat instead of reading into an exponentially growing buffer, which previously cost O(log N) syscalls and allocations and about 2N bytes of copying per file. - Compressing the cache content on disk with zstd at level 1, falling back to raw storage when the data is not compressible. This shrinks cache directories by about 2-4x. The magic number is bumped so that files in the old format are discarded as cache misses and then overwritten in place. - Handing the cache to V8 through a non-owning CachedData wrapper instead of copying the whole buffer on every cache hit. Corrupted cache files keep degrading to silent cache misses and are regenerated, now covered by a regression test. Co-authored-by: Grok <grok@x.ai> Signed-off-by: Yagiz Nizipli <yagiz@nizipli.com>

Creating and freeing a zstd context for every cache file costs more than the (de)compression itself for small caches. Lazily create one decompression context on the handler and reuse it across reads, and share one compression context across all entries in Persist(). Co-authored-by: Grok <grok@x.ai> Signed-off-by: Yagiz Nizipli <yagiz@nizipli.com>

… on nodejs#63861) - Add src/compile_cache_zstd.dict (48 KiB trained on 5.7k objective V8 code cache samples harvested via vm.Script from test/parallel + fixtures + benchmarks; the 300 small benchmark measurement set and the big TS fixture raw are held completely out of training). - Generate and include src/compile_cache_zstd_dict.h (the embeddable form). - Add tools/generate-compile-cache-dict.js (run after updating the .dict). - Wire always-on use of the prepared CDict/DDict in Persist() (pick best of plain vs dict-assisted per entry) and ReadCacheFile() (decompress_usingDDict). - Reuses the ctxs and "only if smaller than raw" policy from Yagiz's PR. - Yes, the dictionary is joined/embedded in the node binary (tiny, must be available with no extra FS for portable/early/restricted cache use). - Measurements on the benchmark scenarios (big TS fixture + 300 held-out small/medium objective samples) with the representative dict: small/medium: raw -> plain-zstd-l1 2.13x -> +dict 3.06x (1.44x further win over the zstd already in nodejs#63861). big: ~2.69x plain (dict neutral/slightly worse, still >> raw; we take the min so big stays optimal). - See the investigation notes in the branch for full details and reproduction. Co-authored-by: Grok (investigation + prototype) PR on top of nodejs#63861

joyeecheung · 2026-06-12T00:46:27Z

It seems the performance gains are within noise and it's mostly only the compression changes size? Can you split them into different PRs and measure them individually?

I am not sure if the read changes actually gives any wins for small files - in the happy path where the cache is small, it's better to just read once and resize rather than stat and read (which are two file system calls instead of just one). Also there is a TOCTOU risk in doing stat, and fstat is not realiable across platforms, so the loop condition should not be gated on the fstat result but we must always read until EOF is reached in case the size is not accurate.
It doesn't appear that compression alone does much to performance or it might actually hurt but just got compensated by other changes. In that case it's better to make that configurable and let users choose instead.
Avoiding the copy would be better and there were precedents in builtin caches that it actually helped. I suspect this was the only one that actually helps performance while the other two may not. Hence it's better to split and measure individually.

Builds on the zstd compression in nodejs#63861 by embedding a small zstd dictionary trained on a diverse corpus of real modules, so each small/medium compile-cache entry compresses better. Per entry we keep the smaller of the plain and dictionary-assisted frame, so the dictionary only ever helps. - Add src/compile_cache_zstd.dict (16 KiB). It is trained on V8 code caches harvested (via vm.compileFunction, the same shape the CJS loader produces) from a diverse corpus: bundled npm packages, lib/, tools/ and a few deps. - Add tools/generate_compile_cache_dict.py and a node.gyp action that generates compile_cache_zstd_dict.h into SHARED_INTERMEDIATE_DIR at build time; no generated header is checked in. libnode include_dirs updated to pick it up. - Prepare the CDict/DDict once per process (shared across all handlers and Workers, matching the lazy-context approach from nodejs#63861) and use them in Persist() and ReadCacheFile(). Persist() compresses the plain and dict frames into separate buffers and selects the smaller, so the written bytes and recorded size always agree. The dictionary is only tried for entries up to 256 KiB; larger blobs never benefit, so the second compression is skipped to avoid wasted work. Falls back to plain zstd if dictionary preparation fails. - The dictionary is embedded in the binary because the compile cache must be usable early, portably, and without extra filesystem state. - No on-disk format change: dict-assisted frames carry the dictID, plain frames carry none, and a single DDict decompresses both. - Size, measured on data held out from training (per-entry min policy): diverse modules go from ~1.87x (plain zstd) to ~2.44x with the dictionary (~24% smaller on disk); on test/parallel, which is not in the training corpus at all, ~1.74x -> ~2.22x (~22% smaller). A real end-to-end run (npm --version, ~70 modules) is ~15% smaller. Read time is unchanged and the extra write-time work is negligible. - Add a multi-module write/read roundtrip test and a startup benchmark (standard createBenchmark harness) plus the many-modules fixture list.

Builds on the zstd compression in nodejs#63861 by embedding a small zstd dictionary trained on a diverse corpus of real modules, so each small/medium compile-cache entry compresses better. Per entry we keep the smaller of the plain and dictionary-assisted frame, so the dictionary only ever helps. - Add src/compile_cache_zstd.dict (16 KiB). It is trained on V8 code caches harvested (via vm.compileFunction, the same shape the CJS loader produces) from a diverse corpus: bundled npm packages, lib/, tools/ and a few deps. - Add tools/generate_compile_cache_dict.py and a node.gyp action that generates compile_cache_zstd_dict.h into SHARED_INTERMEDIATE_DIR at build time; no generated header is checked in. libnode include_dirs updated to pick it up. - Prepare the CDict/DDict once per process (shared across all handlers and Workers, matching the lazy-context approach from nodejs#63861) and use them in Persist() and ReadCacheFile(). Persist() compresses the plain and dict frames into separate buffers and selects the smaller, so the written bytes and recorded size always agree. The dictionary is only tried for entries up to 256 KiB; larger blobs never benefit, so the second compression is skipped to avoid wasted work. Falls back to plain zstd if dictionary preparation fails. - The dictionary is embedded in the binary because the compile cache must be usable early, portably, and without extra filesystem state. - No on-disk format change: dict-assisted frames carry the dictID, plain frames carry none, and a single DDict decompresses both. - Size, measured on data held out from training (per-entry min policy): diverse modules go from ~1.87x (plain zstd) to ~2.44x with the dictionary (~24% smaller on disk); on test/parallel, which is not in the training corpus at all, ~1.74x -> ~2.22x (~22% smaller). A real end-to-end run (npm --version, ~70 modules) is ~15% smaller. Read time is unchanged and the extra write-time work is negligible. - Add a multi-module write/read roundtrip test and a startup benchmark (standard createBenchmark harness).

addaleax · 2026-06-12T08:39:46Z

If we do include dictionary files in our source tree, I'd say we should either include instructions for how to (at least approximately) re-create this file, or even generate it at build time entirely. strings src/compile_cache_zstd.dict results in a surprising amount of Node.js-core specificity, which seems surprising given that the commit that adds it claims it was trained on public npm packages.

(strings src/compile_cache_zstd.dict output in the fold)

wUUu2 _HOFJW [Z)S nding options Paz types PaN process versions openssl internal/process/pre_execution PdR` prepareMainThreadExecution Pc6Wl markBootstrapComplete internal/options "<"; __createBinding __setModuleDefault PaZY4J birname note signatures ObjectSetPrototypeOf PromisePrototypeThen PromiseWithResolvers RegExpPrototypeExec RegExpPrototypeSymbolReplace StringPrototypeSplit StringPrototypeToLowerCase _createBlob _createBlobFromFilePath getDataObject kMaxLength TextDecoder markTransferMode Pbvyo isAnyArrayBuffer PcBS isArrayBufferView PaZ require PaN" util InvalidArgumentError ConnectTimeoutError SessionCache maybeNormalizeConnectError PromiseWithResolvers nonOpStart Pf& writableStreamMarkFirstWriteRequestInFlight createPromiseCallback Pb:6k nonOpWrite $Pg> writableStreamDefaultWriterEnsureReadyPromiseRejected writableStreamDefaultControllerClose DOMException Pc^v extractHighWaterMark extractSizeAlgorithm Pb~: kEmptyObject wgIgAygCACIAIAQgAWtqIQUgASAAa0ECaiEGAkADQCABLQAAIABButUAai0AAEcNAyAAQQJGDQEgAEEBaiEAIAQgAUEBaiIBRw0ACyADIAU2AgAMwwILIAMoAgQhACADQgA3AwAgAyAAIAZBAWoiARAuIgBFBEBB4wEhAgyqAgsgA0H1ATYCHCADIAE2AhQgAyAANgIMQQAhAgzCAgtB9AEhAiABIARGDcECIAMoAgAiACAEIAFraiEFIAEgAGtBAWohBgJAA0AgAS0AACAAQbjVAGotAABHDQIgAEEBRg0BIABBAWohACAEIAFBAWoiAUcNAAsgAyAFNgIADMICCyADQYEEOwEoIAMoAgQhACADQgA3AwAgAyAAIAZBAWoiARAuIgANAwwCCyADQQA2AgALQQAhAiADQQA2AhwgAyABNgIUIANB5R82AhAgA0EINgIMDL8CC0HVASECDKUCCyADQfMBNgIcIAMgATYCFCADIAA2AgxBACECDL0CC0EAIQACQCADKAI4IgJFDQAgAigCQ memoMethod perf calculatedSizcludes kTypes uvErrmapGet ArrayPrototypeIndexOf classRegExp MathMax ArrayIsArray isWindows ObjectIsExtensible isPermissionModelError getExpectedArgumentLength ArrayPrototypeJoin kIsNodeError get source get ports CloseEvent get listening header.js.map @[@b node:path ./large-numbers.js "<"; escapeHTML PcB_C+ convertChangesToXML #noProxy #ProxyAgent #getProxy #timeoutConnection Pc~} #drainPendingRequests connect Pbv2&_ addRequest createSocket .get .get getMilestoneTimestamp getTimeOriginTimestamp PbjK internalBinding performance constants ERR_INVALID_ARG_TYPE ERR_INVALID_ARG_VALUE Pb:e ERR_OUT_OF_RANGE isErrorStackTraceLimitWritable Buffer Pa& inspect validateBoolean validateFunction validateNumber validateString validateOneOf PbbH validateObject validateInteger isReadableStream isWritableStream isNodeStream utilColors PbJ% lazyUtilColors PbNn6 __importStar DQCABLQAAQSBHDUggBCABQQFqIgFHDQALQSUhAwxpC0ElIQMMaAsgAi0ALUEBcQRAQcMBIQMMTwsgAigCBCEAQQAhAyACQQA2AgQgAiAAIAEQKSIABEAgAkEmNgIcIAIgADYCDCACIAFBAWo2AhQMaAsgAUEBaiEBDFwLIAFBAWohASACLwEwIgBBgAFxBEBBACEAAkAgAigCOCIDRQ0AIAMoAlQiA0UNACACIAMRAAAhAAsgAEUNBiAAQRVHDR8gAkEFNgIcIAIgATYCFCACQfkXNgIQIAJBFTYCDEEAIQMMZwsCQCAAQaAEcUGgBEcNACACLQAtQQJxDQBBACEDIAJBADYCHCACIAE2AhQgAkGWEzYCECACQQQ2AgwMZwsgAgJ/IAIvATBBFHFBFEYEQEEBIAItAChBAUYNARogAi8BMkHlAEYMAQsgAi0AKUEFRgs6AC5BACEAAkAgAigCOCIDRQ0AIAMoAiQiA0UNACACIAMRAAAhAAsCQAJAAkACQAJAIAAOFgIBAAQEBAQEBAQE get ok get statusText get headers get body get bodyUsed clone parsePullArgs PcNs validateBackpressure primordials PcJ= internal/encoding TextEncoder internal/errors Pav codes internal/util internal/util/types internal/validators onComplete E`w PaN onError node:assert ../core/symbols ../core/errors ../core/util Pab beep ArrayFrom ArrayPrototypeFilter ArrayPrototypeIncludes ArrayPrototypeMap ArrayPrototypePush PcB,t ArrayPrototypePushApply ArrayPrototypeSlice ObjectDefineProperty Pb>c ObjectKeys PdRc ObjectPrototypeHasOwnProperty Pbf) ReflectGet SafeMap SafeSet StringPrototypeSlice Error PbFOT _flushFlag __esModule .desc.get stop destroyer primordials internal/errors Pav codes internal/streams/utils __esModule fs/promises PaF-M path PaZ require Pa"/ module __filename __dirname

@anonrig I'm going to mark this as un-resolved again, since there was no response here as far as I can tell

Sorry, I didn't see this up until now and didn't pressed the "resolve" button at all. Will address your concerns.

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jun 11, 2026

anonrig requested review from addaleax, jasnell and joyeecheung June 11, 2026 22:14

anonrig force-pushed the compile-cache-perf branch from 816f4ef to 0cf8818 Compare June 11, 2026 22:38

anonrig force-pushed the compile-cache-perf branch from 0cf8818 to f427fb3 Compare June 11, 2026 22:40

lemire approved these changes Jun 12, 2026

View reviewed changes

lemire mentioned this pull request Jun 12, 2026

src: embed zstd dictionary for further compile cache size wins anonrig/node#16

Merged

addaleax reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

src: improve compile cache performance and size#63861

src: improve compile cache performance and size#63861
anonrig wants to merge 3 commits into
nodejs:mainfrom
anonrig:compile-cache-perf

anonrig commented Jun 11, 2026

Uh oh!

nodejs-github-bot commented Jun 11, 2026

Uh oh!

anonrig commented Jun 11, 2026

Uh oh!

joyeecheung commented Jun 12, 2026 •

edited

Loading

Uh oh!

addaleax Jun 12, 2026

Uh oh!

addaleax Jun 13, 2026

Uh oh!

anonrig Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

anonrig commented Jun 11, 2026

Uh oh!

nodejs-github-bot commented Jun 11, 2026

Uh oh!

anonrig commented Jun 11, 2026

Uh oh!

joyeecheung commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

addaleax Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

addaleax Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

anonrig Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

joyeecheung commented Jun 12, 2026 •

edited

Loading