Skip to content

feat(merge): adversarial-review test coverage (F4/F5/F7) + F14 sub-region engine fix#6428

Open
g-talbot wants to merge 1 commit into
gtt/legacy-promotion-and-schema-evofrom
gtt/adversarial-review-test-coverage
Open

feat(merge): adversarial-review test coverage (F4/F5/F7) + F14 sub-region engine fix#6428
g-talbot wants to merge 1 commit into
gtt/legacy-promotion-and-schema-evofrom
gtt/adversarial-review-test-coverage

Conversation

@g-talbot
Copy link
Copy Markdown
Contributor

Stacked on #6423. Closes the remaining open code findings from the adversarial review (.planning/reviews/2026-05-13-adversarial-review-parquet-merger.md).

Summary

Three test additions plus one engine fix surfaced by the F4 tests:

  • F4 — MS-7 generalized to multi-input × sub-region.
  • F14 — sub-region path inverted col/sub-region loop nesting to keep page cache bounded.
  • F5 — prefix-aware proptest over the streaming engine.
  • F7 — production-shape integration test (5 inputs × ~4 RGs × 4 outputs × prefix_len=1).

F4: MS-7 generalized

  • test_ms7_per_input_bound_across_num_inputs sweeps num_inputs ∈ {1, 3, 8} × rows_per_input ∈ {3 000, 30 000} and asserts the per-input peak stays bounded.
  • test_ms7_per_input_bound_across_sub_regions_does_not_scale_with_rows runs the prefix_len=0 multi-output sub-region path at 3 000 vs 30 000 rows and asserts peak doesn't scale with input row count.

The sub-region test surfaced F14 — without the engine fix, the sub-region path's peak grew ~9× when rows grew 10×.

F14: invert col/sub-region loop nesting

Parquet streams emit pages in column-major order (all of col 0, then all of col 1, ...). The old sub-region-outer / col-inner ordering meant that while processing sub-region 0's col K, the stream emitted cols 0..K-1's remaining pages first to reach col K — those skipped pages got cached under their own col_idx for later sub-regions to consume. The cache scaled with input row count.

Fix: new process_split_region_col_outer function for the needs_split path. Cols iterate in the outer loop, sub-regions in the inner. Each parquet col chunk is fully consumed from the stream across all sub-regions before col K+1 starts. Single-region path stays on the existing process_region unchanged.

Mechanics: pre-determine writer assignments for the region's sub-regions (sub-regions can span multiple output writers; consecutive sub-regions on the same writer get coalesced into one combined Region so each writer holds one concurrent RG — RGs on the same writer are sequential per parquet's single-active-RG constraint).

F5: prefix-aware proptest

prop_merge_prefix_aligned_streaming sweeps (num_inputs ∈ 1..=3, per-input RG specs, num_outputs ∈ 1..=3) with prefix_len=1. Asserts on every generated case:

  • MC-1: rows preserved.
  • MC-3: sorted_series monotone within each output.
  • MS-3: MergeOutputFile.num_row_groups matches footer.
  • PA-1+PA-3: assert_unique_rg_prefix_keys passes.
  • CS-1: MergeOutputFile.output_rg_partition_prefix_len matches on-disk KV.

32 cases capped to keep runtime tight. Fixture honors the storekey property: different metric_names produce non-overlapping sorted_series byte ranges, same (metric, row_offset) across inputs gets the same sorted_series (the realistic "same series in multiple splits" case).

F7: production-shape integration test

test_f7_production_shape_multi_input_multi_rg_multi_output: 5 inputs × ~4 prefix-aligned RGs each × 4 outputs × prefix_len=1, with overlap (every metric appears in 2-3 of the 5 inputs). Asserts MC-1, MS-3, PA-1+PA-3, MS-5 (cross-output sorted_series monotonicity — a single metric CAN span outputs when its region overflows the per-output budget; the cross-output invariant is sorted_series monotonicity, not "each metric in one output"), and CS-1.

Test plan

  • cargo test -p quickwit-parquet-engine --lib — 498 unit tests pass (was 495 on feat(merge): legacy promotion path + body-col schema evolution #6423; added test_ms7_per_input_bound_across_num_inputs, test_ms7_per_input_bound_across_sub_regions_does_not_scale_with_rows, prop_merge_prefix_aligned_streaming, test_f5_single_input_two_metrics_minimal, test_f7_production_shape_multi_input_multi_rg_multi_output).
  • cargo clippy -p quickwit-parquet-engine --tests --all-features with -Dwarnings.
  • cargo fmt --all -- --check (nightly).
  • cargo doc --no-deps -p quickwit-parquet-engine warning-free.

Status of the adversarial review

After this PR, the remaining open work from the original review is just the spec docs (PA/MS/CS supplement to ADR-002). All code findings (F1, F2, F4, F6–F14) are closed across PRs #6424, #6425, #6426, #6423, and this one.

🤖 Generated with Claude Code

@g-talbot g-talbot requested a review from a team as a code owner May 13, 2026 18:00
@g-talbot g-talbot force-pushed the gtt/legacy-promotion-and-schema-evo branch from 2a3f09a to edd45a6 Compare May 13, 2026 18:11
Three test additions plus one engine fix surfaced by the F4 tests.

## F4: MS-7 generalized to multi-input × sub-region

The existing MS-7 test proved the per-input page-cache bound for one
input and one region. F4 extends the coverage:

- `test_ms7_per_input_bound_across_num_inputs` sweeps `num_inputs ∈ {1, 3, 8}` × `rows_per_input ∈
  {3 000, 30 000}` and asserts the per-input peak stays bounded. Cross-axis growth check: going
  from 1 input to 8 must not push the peak up.
- `test_ms7_per_input_bound_across_sub_regions_does_not_scale_with_rows` runs the prefix_len=0
  multi-output sub-region path at 3 000 vs 30 000 rows and asserts peak doesn't scale with input
  row count. **This test surfaced F14 (below) — without the engine fix, the sub-region path's peak
  grew ~9× when rows grew 10×.**

Tests serialize via `ms7_serial_lock` because
`PEAK_BODY_COL_PAGE_CACHE_LEN` is process-global; concurrent tests
would pollute each other's readings.

## F14: invert col/sub-region loop nesting in the split-region path

Parquet streams emit pages in column-major order (all of col 0,
then all of col 1, ...). The old sub-region-outer / col-inner
ordering meant that while processing sub-region 0's col K, the
stream emitted cols 0..K-1's remaining pages first to reach col K —
those skipped pages got cached under their own col_idx for later
sub-regions to consume, and the cache scaled with input row count.

Fix: new `process_split_region_col_outer` function for the
`needs_split` path. Cols iterate in the outer loop, sub-regions in
the inner. Each parquet col chunk is fully consumed from the stream
across all sub-regions before col K+1 starts. Cache for col K is
empty before col K+1's pages arrive.

Mechanics: pre-determine writer assignments for the region's
sub-regions (a top-level region's sub-regions may span multiple
output writers; consecutive sub-regions on the same writer get
coalesced into one combined Region so each writer holds one RG
concurrently — RGs on the same writer are sequential, so coalescing
keeps the parquet writer's single-active-RG constraint intact).
Single-region path stays on the existing `process_region`.

## F5: prefix-aware proptest

`prop_merge_prefix_aligned_streaming` sweeps `(num_inputs ∈ 1..=3,
per-input RG specs, num_outputs ∈ 1..=3)` with prefix_len=1 and
asserts MC-1 (rows preserved), MC-3 (sorted_series monotone within
each output), MS-3 (num_row_groups matches footer), PA-1+PA-3
(`assert_unique_rg_prefix_keys`), and CS-1 (metastore prefix_len ==
KV) on every generated case. 32 cases capped to keep runtime under
a second.

Fixture: `make_prefix_len_one_input` writes one RG per
`(metric_name, rows)` entry by calling `writer.flush()` between
batches. `sorted_series` encodes
`metric_base + row_offset_within_metric`, mirroring production's
storekey property that different metric_names produce
non-overlapping `sorted_series` byte ranges.

Plus a focused unit test `test_f5_single_input_two_metrics_minimal`
that pins one specific case for fast iteration.

## F7: production-shape integration test

`test_f7_production_shape_multi_input_multi_rg_multi_output`: 5
inputs × ~4 prefix-aligned RGs each × 4 outputs × prefix_len=1.
Asserts the full invariant bundle (MC-1, MS-3, PA-1+PA-3, MS-5
cross-output sorted_series monotonicity, CS-1) — the corner the
adversarial review flagged as "untested production case".

MS-5 is "across adjacent outputs, sorted_series is monotone
non-decreasing." A single metric CAN span outputs (the engine
splits at sorted_series transitions inside an overflowing region),
so the cross-output invariant is sorted_series monotonicity, not
"each metric in one output."

## Verification

- `cargo test -p quickwit-parquet-engine --lib` — 498 unit tests pass.
- `cargo clippy -p quickwit-parquet-engine --tests --all-features` with `-Dwarnings`.
- `cargo doc --no-deps -p quickwit-parquet-engine` warning-free.
- `cargo fmt --all -- --check` (nightly via PATH override).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/adversarial-review-test-coverage branch from 0e2b36d to 6ed25aa Compare May 13, 2026 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant