Skip to content

[SPARK-56892][SQL] Bulk read optimization for Parquet DELTA_BINARY_PACKED decoding#55919

Open
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-56892-delta-binary-packed-bulk-read
Open

[SPARK-56892][SQL] Bulk read optimization for Parquet DELTA_BINARY_PACKED decoding#55919
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-56892-delta-binary-packed-bulk-read

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented May 16, 2026

What changes were proposed in this pull request?

Replace per-element lambda dispatch in readIntegers/readLongs with bulk paths that compute prefix sums in-place over the unpacked delta buffer and write via putInts/putLongs (backed by System.arraycopy on-heap).

Three optimizations in this PR:

  1. Bulk read for INT32/INT64: readBulkIntegers and readBulkLongs replace the generic readValues() lambda-per-value path. A single loadMiniBlockBulk method handles block/mini-block loading, prefix-sum computation, and delegates the type-specific write to a BulkWriter callback (called once per mini-block, not per value).

  2. Zero-allocation unsigned long encoding: Replace new BigInteger(Long.toUnsignedString(v)).toByteArray() (3 allocations per value: String + BigInteger + byte[]) with ByteBuffer.putLong into a reusable scratch buffer. The shared utility encodeUnsignedLongBigEndian is extracted into VectorizedReaderBase and applied to all call sites (VectorizedDeltaBinaryPackedReader, UnsignedLongUpdater, ParquetDictionary).

  3. Benchmark fix: Add unsignedLongVec.reset() before readUnsignedLongs to prevent unbounded arrayData() growth across benchmark iterations (OOM).

Why are the changes needed?

The DELTA_BINARY_PACKED decoder was 2-5x slower than PLAIN encoding for INT32/INT64 reads due to per-element lambda dispatch and lack of bulk vector writes. The readUnsignedLongs path allocated 3 objects per value (12,288 allocations per 4096-row batch) due to BigInteger(Long.toUnsignedString(v)).

Benchmark results on the same machine (AMD EPYC 9V45, OpenJDK 25.0.3+9-LTS):

Benchmark Baseline (M/s) After (M/s) Speedup
INT32 readIntegers, monotonic 644 873 1.4x
INT32 readIntegers, small-delta 466 553 1.2x
INT32 readIntegers, wide random 357 417 1.2x
INT64 readLongs, constant 316 879 2.8x
INT64 readLongs, monotonic 252 951 3.8x
INT64 readLongs, small-delta 216 587 2.7x
INT64 readLongs, wide random 163 313 1.9x
readUnsignedLongs 9.2 66 7.2x

Does this PR introduce any user-facing change?

No. This is a performance improvement to internal Parquet decoding. No API or behavior changes.

How was this patch tested?

  • Existing unit tests: ParquetDeltaEncodingInteger (13 tests), ParquetDeltaEncodingLong (13 tests), ParquetDeltaByteArrayEncodingSuite, ParquetDeltaLengthByteArrayEncodingSuite, ParquetVectorizedSuite (25 tests), ParquetIOSuite (unsigned Parquet logical types test) -- all pass.
  • Benchmark: VectorizedDeltaReaderBenchmark run before and after on the same machine with changes stashed/unstashed for fair comparison.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenCode (Claude claude-opus-4.6)

…CKED decoding

Replace per-element lambda dispatch in readIntegers/readLongs with bulk
paths that compute prefix sums in-place over the unpacked delta buffer
and write via putInts/putLongs (backed by System.arraycopy on-heap).

Also optimize readUnsignedLongs by replacing
BigInteger(Long.toUnsignedString(v)).toByteArray() with zero-allocation
manual byte encoding using ByteBuffer.putLong. Extract the shared
utility encodeUnsignedLongBigEndian into VectorizedReaderBase and apply
it to all call sites (UnsignedLongUpdater, ParquetDictionary).

Fix benchmark OOM: add unsignedLongVec.reset() before readUnsignedLongs
to prevent unbounded arrayData growth across iterations.

Same-machine results (AMD EPYC 9V45):
- INT32 readIntegers: 1.2-1.4x faster (monotonic/delta/wide patterns)
- INT64 readLongs:    1.9-3.8x faster (all patterns)
- readUnsignedLongs:  7.8x faster
@iemejia iemejia force-pushed the SPARK-56892-delta-binary-packed-bulk-read branch from 9d8eb10 to 04a4f8e Compare May 16, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant