src: embed zstd dictionary for further compile cache size wins by lemire · Pull Request #16 · anonrig/node

lemire · 2026-06-12T01:37:18Z

Experimental, for discussion

Gist: When you have lots of small files, using a pregenerated dictionary is a win compression wise, at the cost of a few KB of extra payload.

Summary

This builds on the zstd compression added in nodejs#63861 by embedding a small (16
KiB) zstd dictionary trained on a diverse corpus of real modules. For each
compile-cache entry we compress with and without the dictionary and keep
whichever frame is smaller, so the dictionary only ever helps.

The dictionary mainly benefits the common "many small/medium modules" case,
where individual code caches are too short for plain zstd to find much
redundancy on its own. Large single blobs are left untouched (plain zstd
already wins there, and the dictionary path is skipped for them entirely).

Size benefits

The compile cache is already compressed by nodejs#63861; the question this PR answers
is how much more the dictionary saves on top of that plain zstd. All
numbers below use the shipped policy (per-entry min(plain, dict), level 1) and
compare against the no-dictionary baseline. The ratios are on-disk size vs. the
raw V8 code cache, so higher is smaller-on-disk.

corpus (held out from training)	plain zstd (nodejs#63861)	+ dictionary	extra saving
diverse modules (npm/lib/deps, 226 unseen files)	1.87×	2.44×	−24%
`test/parallel` (4,119 files, not in training)	1.74×	2.22×	−22%
`npm --version` end-to-end (~70 real modules)	138 KB	117 KB	−15%

Reading this:

The first two rows are measured offline on code caches the dictionary was
never trained on (a held-out file split, and an entire corpus —
test/parallel — absent from training), so they reflect generalization, not
memorization.
The third row is a real end-to-end run through the actual module loader and
persist path: the on-disk cache for npm's module graph shrinks from 138 KB to
117 KB.
Put plainly: src: improve compile cache performance and size nodejs/node#63861 already made the cache ~1.8–1.9× smaller than raw; this
dictionary takes it to ~2.2–2.4×, recovering roughly another 15–24% of the
on-disk footprint, concentrated in the small/medium modules that dominate real
workloads.
Large single blobs (e.g. the typescript.js fixture, > 256 KiB) stay on the
plain path and are byte-for-byte unchanged.

The cost is +16 KiB in the node binary (the embedded dictionary). A 32 KiB
dictionary was measured to add only ~1 percentage point and 48 KiB nothing
beyond that, so 16 KiB is the size/benefit knee.

Timing (does the dictionary make things slower?)

A/B against the no-dictionary baseline (this commit's parent), same tree, only
compile_cache.cc/.h differ. Trimmed median wall time per process (Apple
Silicon, both binaries measured back-to-back). The nocache row uses no compile
cache at all, so its delta is the run-to-run noise floor — read the other rows
against it.

Big single blob — typescript.js (~1.8 MB cache, 1 entry; above the 256 KiB
threshold, so the dictionary is skipped on write):

regime	baseline	+dict	Δ
nocache (noise)	86.2 ms	84.4 ms	−1.8 ms
cold (write/persist)	104.4 ms	101.9 ms	equal (gated)
warm (read/decomp)	53.6 ms	53.9 ms	+0.3 ms (≈ noise)
cache size	752,364 B	751,529 B	equal

Many small modules (120 entries; all below the threshold, dictionary applied):

regime	baseline	+dict	Δ
nocache (noise)	31.8 ms	32.5 ms	+0.7 ms
cold (write/persist)	53.2 ms	53.9 ms	+0.7 ms (≈ noise)
warm (read/decomp)	30.6 ms	30.5 ms	−0.1 ms
cache size	116,852 B	86,633 B	−25.9%

Takeaways: the read path — paid on every warm-cache startup — is within
noise; decompress_usingDDict is not measurably slower than plain decompress,
and the one-time per-process DDict digest of a 16 KiB dictionary is
negligible. Write overhead (only at persist, on shutdown) is sub-millisecond for
many modules and zero for the big blob (the size gate skips it). On-disk size
never regresses.

Why embed the dictionary

The compile cache must be usable early during startup, portably, and without
relying on any additional filesystem state, so the dictionary is compiled into
the binary rather than loaded at runtime. Only the small binary .dict is
checked in; the C array is generated at build time.

How the dictionary is trained (reproducible)

The dictionary is trained on V8 code caches harvested via vm.compileFunction
(the same shape the CJS loader produces) from a diverse in-tree corpus: bundled
npm packages (deps/npm/node_modules), lib/, tools/, and a few deps/
libraries — ~1,200 modules. Those samples are fed to zstd --train --maxdict=16384. The measurement corpora above are disjoint from this training
set. The harvest+train script can be committed under tools/ so the .dict is
regenerable from the tree rather than an opaque drop-in.

On-disk compatibility

No format change. The dictionary is a trained zstd dictionary, so dict-assisted
frames carry its dictID and plain frames carry none; the reader decompresses
both correctly with a single DDict. Compile-cache directories are already
keyed by Node version, arch, and a cache-data version tag, so a future change to
the embedded dictionary is naturally isolated to a fresh cache directory.

Builds on the zstd compression in nodejs#63861 by embedding a small zstd dictionary trained on a diverse corpus of real modules, so each small/medium compile-cache entry compresses better. Per entry we keep the smaller of the plain and dictionary-assisted frame, so the dictionary only ever helps. - Add src/compile_cache_zstd.dict (16 KiB). It is trained on V8 code caches harvested (via vm.compileFunction, the same shape the CJS loader produces) from a diverse corpus: bundled npm packages, lib/, tools/ and a few deps. - Add tools/generate_compile_cache_dict.py and a node.gyp action that generates compile_cache_zstd_dict.h into SHARED_INTERMEDIATE_DIR at build time; no generated header is checked in. libnode include_dirs updated to pick it up. - Prepare the CDict/DDict once per process (shared across all handlers and Workers, matching the lazy-context approach from nodejs#63861) and use them in Persist() and ReadCacheFile(). Persist() compresses the plain and dict frames into separate buffers and selects the smaller, so the written bytes and recorded size always agree. The dictionary is only tried for entries up to 256 KiB; larger blobs never benefit, so the second compression is skipped to avoid wasted work. Falls back to plain zstd if dictionary preparation fails. - The dictionary is embedded in the binary because the compile cache must be usable early, portably, and without extra filesystem state. - No on-disk format change: dict-assisted frames carry the dictID, plain frames carry none, and a single DDict decompresses both. - Size, measured on data held out from training (per-entry min policy): diverse modules go from ~1.87x (plain zstd) to ~2.44x with the dictionary (~24% smaller on disk); on test/parallel, which is not in the training corpus at all, ~1.74x -> ~2.22x (~22% smaller). A real end-to-end run (npm --version, ~70 modules) is ~15% smaller. Read time is unchanged and the extra write-time work is negligible. - Add a multi-module write/read roundtrip test and a startup benchmark (standard createBenchmark harness).

anonrig approved these changes Jun 12, 2026

View reviewed changes

anonrig merged commit a6273b1 into anonrig:compile-cache-perf Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: embed zstd dictionary for further compile cache size wins#16

src: embed zstd dictionary for further compile cache size wins#16
anonrig merged 1 commit into
anonrig:compile-cache-perffrom
lemire:compile-cache-zstd-dict-pr

lemire commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lemire commented Jun 12, 2026

Summary

Size benefits

Timing (does the dictionary make things slower?)

Why embed the dictionary

How the dictionary is trained (reproducible)

On-disk compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants