[SPARK-56892][SQL] Bulk read optimization for Parquet DELTA_BINARY_PACKED decoding#55919
Open
iemejia wants to merge 1 commit into
Open
[SPARK-56892][SQL] Bulk read optimization for Parquet DELTA_BINARY_PACKED decoding#55919iemejia wants to merge 1 commit into
iemejia wants to merge 1 commit into
Conversation
…CKED decoding Replace per-element lambda dispatch in readIntegers/readLongs with bulk paths that compute prefix sums in-place over the unpacked delta buffer and write via putInts/putLongs (backed by System.arraycopy on-heap). Also optimize readUnsignedLongs by replacing BigInteger(Long.toUnsignedString(v)).toByteArray() with zero-allocation manual byte encoding using ByteBuffer.putLong. Extract the shared utility encodeUnsignedLongBigEndian into VectorizedReaderBase and apply it to all call sites (UnsignedLongUpdater, ParquetDictionary). Fix benchmark OOM: add unsignedLongVec.reset() before readUnsignedLongs to prevent unbounded arrayData growth across iterations. Same-machine results (AMD EPYC 9V45): - INT32 readIntegers: 1.2-1.4x faster (monotonic/delta/wide patterns) - INT64 readLongs: 1.9-3.8x faster (all patterns) - readUnsignedLongs: 7.8x faster
9d8eb10 to
04a4f8e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Replace per-element lambda dispatch in
readIntegers/readLongswith bulk paths that compute prefix sums in-place over the unpacked delta buffer and write viaputInts/putLongs(backed bySystem.arraycopyon-heap).Three optimizations in this PR:
Bulk read for INT32/INT64:
readBulkIntegersandreadBulkLongsreplace the genericreadValues()lambda-per-value path. A singleloadMiniBlockBulkmethod handles block/mini-block loading, prefix-sum computation, and delegates the type-specific write to aBulkWritercallback (called once per mini-block, not per value).Zero-allocation unsigned long encoding: Replace
new BigInteger(Long.toUnsignedString(v)).toByteArray()(3 allocations per value: String + BigInteger + byte[]) withByteBuffer.putLonginto a reusable scratch buffer. The shared utilityencodeUnsignedLongBigEndianis extracted intoVectorizedReaderBaseand applied to all call sites (VectorizedDeltaBinaryPackedReader,UnsignedLongUpdater,ParquetDictionary).Benchmark fix: Add
unsignedLongVec.reset()beforereadUnsignedLongsto prevent unboundedarrayData()growth across benchmark iterations (OOM).Why are the changes needed?
The DELTA_BINARY_PACKED decoder was 2-5x slower than PLAIN encoding for INT32/INT64 reads due to per-element lambda dispatch and lack of bulk vector writes. The
readUnsignedLongspath allocated 3 objects per value (12,288 allocations per 4096-row batch) due toBigInteger(Long.toUnsignedString(v)).Benchmark results on the same machine (AMD EPYC 9V45, OpenJDK 25.0.3+9-LTS):
Does this PR introduce any user-facing change?
No. This is a performance improvement to internal Parquet decoding. No API or behavior changes.
How was this patch tested?
ParquetDeltaEncodingInteger(13 tests),ParquetDeltaEncodingLong(13 tests),ParquetDeltaByteArrayEncodingSuite,ParquetDeltaLengthByteArrayEncodingSuite,ParquetVectorizedSuite(25 tests),ParquetIOSuite(unsigned Parquet logical types test) -- all pass.VectorizedDeltaReaderBenchmarkrun before and after on the same machine with changes stashed/unstashed for fair comparison.Was this patch authored or co-authored using generative AI tooling?
Generated-by: OpenCode (Claude claude-opus-4.6)