test(bench): add encode_loop_z000033 example for clean encoder profiles#297
test(bench): add encode_loop_z000033 example for clean encoder profiles#297polaz wants to merge 4 commits into
Conversation
Mirrors decode_loop_z000033 for the encode side: reads a raw corpus, loops `compress_to_vec` at a given level N times, no criterion and no FFI. The `compare_ffi` compress benchmark runs the donor in the same process, so its flamegraph mixes `ZSTD_*` donor symbols with ours; this binary isolates the pure-Rust encoder hot path for perf-record. black_box + len-sum sink defeats dead-code elimination of the compress call.
|
Warning Review limit reached
More reviews will be available in 5 minutes and 36 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new encode_loop_z000033 example binary intended for collecting clean perf/cargo flamegraph profiles of the pure-Rust encoder hot path without Criterion overhead or in-process C donor symbols.
Changes:
- Add
zstd/examples/encode_loop_z000033.rsexample that reads a corpus (or generates a deterministic synthetic fallback) and repeatedly compresses it at a chosen level/iteration count. - Add a
black_box/length-sum sink to prevent dead-code elimination of the compress work.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…er copy The input is a contiguous `&[u8]`; `compress_to_vec` takes `impl Read` and re-buffers it via `read_to_end` into a fresh `Vec` every iteration, adding per-iter input allocation + memmove that pollutes the encoder flamegraph. `compress_slice_to_vec` consumes the slice directly, keeping the profile focused on the encode hot path.
Allocate the output Vec once and clear()-reuse it every iteration so steady-state iters do zero output-buffer allocation — keeps the flamegraph on the encoder hot path instead of per-iter Vec growth + first-touch page faults. Drive FrameCompressor directly over the contiguous slice. Sync the module doc to the actual API used.
| //! Build: cargo build --profile flamegraph -p structured-zstd \ | ||
| //! --example encode_loop_z000033 --features dict_builder | ||
| //! Run: cargo flamegraph --example encode_loop_z000033 --features dict_builder \ | ||
| //! --profile flamegraph -- <level> <iters> <corpus_path> |
Summary
Adds
encode_loop_z000033example — the encode-side analog ofdecode_loop_z000033. Reads a raw corpus, loopscompress_to_vecat a given level N times, no criterion and no FFI.Why
The
compare_fficompress benchmark runs the donor (ZSTD_compress) in the same process, socargo flamegraphon it mixes donorZSTD_DUBT_findBestMatch/ZSTD_recordFingerprintsymbols with ours — unusable for isolating the pure-Rust encoder hot path. This binary produces clean encoder perf-record samples.black_box+ len-sum sink defeats dead-code elimination of the compress call.Usage
Used to map the Fast L-1 encode hot path (start_matching 25%, sequence-emit 30%, HUF encode4x 8%) for #111 Phase 7 work.
Related to #111.