Patches have correct dtype by construction instead of normalised during array construction#8626
Conversation
`Patches::cast_values` was a crutch that papered over patch values not matching the parent array's dtype, and ALPRD/Sparse each carried a construction-time normalization step (`canonicalize_patches`, `normalize_patches_dtype`) to fix patches up after the fact. Remove all three: patch-producing operations now yield the correct dtype directly, and constructors assert rather than normalize. - `Patches::mask` widens the surviving values to nullable. Masking an array always yields a nullable result, so the masked patches now carry the correct dtype and callers (ALP mask) no longer re-cast them. - ALP take/mask: `Patches::take`/`mask` already produce values matching the taken/masked encoded array, so build via `ALP::try_new`, which asserts the patch dtype instead of forcing it. - bitpacking take: `PrimitiveArray::patch` (via `Validity::patch`) already asserts matching nullability, so no cast is needed. - ALPRD: drop `canonicalize_patches`. Exceptions are always the non-nullable left-parts dtype; `validate_parts` now requires exactly that, `take` casts the taken exceptions back to it (folded into the take call), and the other construction paths already satisfy it. Neither `ALPRD::try_new` nor `RDEncoder::encode` needs an `ExecutionCtx` anymore (the RD split does no execution); their callers that only plumbed a ctx to pass it down are cleaned up too. - Sparse: drop `normalize_patches_dtype`. Both `Sparse::try_new` and the patches-based paths now assert that the values/patches already match the fill dtype instead of casting; callers construct matching values. Signed-off-by: Robert Kruszewski <robert@spiraldb.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UDa4BYuhQyRHWCUGTanVzE Signed-off-by: Robert Kruszewski <github@robertk.io>
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.923x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (0.923x ➖, 3↑ 0↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.005x ➖, 0↑ 0↓)
datafusion / parquet (0.996x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.007x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.017x ➖, 0↑ 1↓)
duckdb / parquet (1.001x ➖, 0↑ 0↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.991x ➖, 2↑ 5↓)
datafusion / vortex-compact (1.037x ➖, 2↑ 5↓)
datafusion / parquet (1.078x ➖, 1↑ 8↓)
datafusion / arrow (0.938x ➖, 6↑ 1↓)
duckdb / vortex-file-compressed (1.058x ➖, 0↑ 7↓)
duckdb / vortex-compact (0.985x ➖, 0↑ 0↓)
duckdb / parquet (1.083x ➖, 0↑ 6↓)
duckdb / duckdb (1.122x ❌, 0↑ 15↓)
File Size Changes (10 files changed, -0.2% overall, 4↑ 6↓)
Totals:
|
Merging this PR will degrade performance by 15.17%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
16 µs | 26.4 µs | -39.51% |
| ⚡ | Simulation | rebuild_naive |
109.1 µs | 91.7 µs | +18.98% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/remove-cast-values-crutch-7u8efe (9597529) with develop (79c6b0f)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.043x ➖, 1↑ 7↓)
datafusion / vortex-compact (1.038x ➖, 0↑ 4↓)
datafusion / parquet (1.049x ➖, 1↑ 11↓)
duckdb / vortex-file-compressed (1.055x ➖, 0↑ 8↓)
duckdb / vortex-compact (1.047x ➖, 0↑ 8↓)
duckdb / parquet (1.026x ➖, 0↑ 2↓)
duckdb / duckdb (1.045x ➖, 0↑ 8↓)
File Size Changes (7 files changed, -0.0% overall, 4↑ 3↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.024x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.956x ➖, 0↑ 0↓)
datafusion / parquet (1.136x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (0.939x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.981x ➖, 0↑ 0↓)
duckdb / parquet (0.963x ➖, 0↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.023x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.999x ➖, 0↑ 0↓)
duckdb / parquet (1.021x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: Clickbench Sorted on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.984x ➖, 1↑ 2↓)
datafusion / parquet (1.013x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.018x ➖, 0↑ 1↓)
duckdb / parquet (1.001x ➖, 0↑ 0↓)
duckdb / duckdb (1.000x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -0.0% overall, 91↑ 110↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.003x ➖, 0↑ 0↓)
datafusion / parquet (0.995x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.028x ➖, 0↑ 7↓)
duckdb / parquet (1.000x ➖, 0↑ 0↓)
duckdb / duckdb (0.988x ➖, 0↑ 0↓)
File Size Changes (106 files changed, -0.0% overall, 49↑ 57↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.127x ❌, 0↑ 18↓)
datafusion / vortex-compact (1.110x ❌, 0↑ 12↓)
datafusion / parquet (1.116x ❌, 0↑ 15↓)
datafusion / arrow (1.188x ❌, 0↑ 22↓)
duckdb / vortex-file-compressed (1.108x ❌, 0↑ 12↓)
duckdb / vortex-compact (1.078x ➖, 0↑ 6↓)
duckdb / parquet (1.066x ➖, 0↑ 6↓)
duckdb / duckdb (1.031x ➖, 0↑ 2↓)
File Size Changes (26 files changed, +0.0% overall, 10↑ 16↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.881x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.929x ➖, 0↑ 0↓)
datafusion / parquet (0.935x ➖, 3↑ 1↓)
duckdb / vortex-file-compressed (0.903x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.873x ➖, 0↑ 0↓)
duckdb / parquet (1.004x ➖, 0↑ 0↓)
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.060x ➖, 0↑ 2↓)
datafusion / parquet (1.037x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.030x ➖, 0↑ 0↓)
duckdb / parquet (1.023x ➖, 0↑ 0↓)
duckdb / duckdb (1.032x ➖, 0↑ 0↓)
File Size Changes (4 files changed, -0.0% overall, 1↑ 3↓)
Totals:
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.956x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.875x ➖, 1↑ 0↓)
datafusion / parquet (0.952x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.890x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.827x ➖, 0↑ 0↓)
duckdb / parquet (0.824x ➖, 0↑ 0↓)
|
Benchmarks: Random AccessVortex (geomean): 0.954x ➖ How to read Verdict and Engines
unknown / unknown (0.978x ➖, 5↑ 4↓)
|
Benchmarks: CompressionVortex (geomean): 0.995x ➖ How to read Verdict and Engines
unknown / unknown (1.001x ➖, 2↑ 2↓)
|
|
@claude review in depth |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Patches::cast_values was a crutch and sign that in too many places we were
producing wrong dtypes. This pr removes this function and fixes up dtypes where
necessary. We still need a cast in ALP-RD take as ALP-RD patches must be non
nullable