[SPARK-56893][SQL] Optimize Parquet dictionary decoding with hasNull fast path and per-class updater overrides#55920
Open
iemejia wants to merge 1 commit into
Conversation
…tionary decoding Add a static decodeBatch helper on ParquetVectorUpdater that splits dictionary ID decoding into two loops: one when values.hasNull() is false (skipping per-element isNullAt checks entirely) and one when nulls are present. Six hot-path updaters (IntegerUpdater, IntegerToLongUpdater, LongUpdater, FloatUpdater, FloatToDoubleUpdater, DoubleUpdater) override decodeDictionaryIds with a one-line delegation to decodeBatch. Although the logic is identical to the default method, each per-class override gives the C2 JIT compiler a monomorphic call site for decodeSingleDictionaryId, enabling full inlining of the type-specific decode expression. The default interface method's bytecode is shared by all ~30 implementors, producing a megamorphic profile that prevents inlining. Benchmark results on AMD EPYC 9V45 (1M rows, dict size 100): - No nulls: ~1.23-1.27x throughput improvement - With nulls: neutral (isNullAt dominates regardless) Also adds ParquetDictionaryDecodeBenchmark with global pre-warm that interleaves both hasNull() branches to produce fair C2 profiles.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds two optimizations to the Parquet vectorized dictionary decode path (
ParquetVectorUpdater.decodeDictionaryIds):hasNull()fast path: A newstatic decodeBatchhelper onParquetVectorUpdatersplits decoding into two loops — whenvalues.hasNull()is false, the per-elementisNullAt(i)check is skipped entirely.Per-class
decodeDictionaryIdsoverrides in six hot-path updaters (IntegerUpdater,IntegerToLongUpdater,LongUpdater,FloatUpdater,FloatToDoubleUpdater,DoubleUpdater): each override is a one-line delegation todecodeBatch(... this). Although the logic is identical to the default method, each per-class override gives the C2 JIT compiler its own bytecode — and therefore a monomorphic call site fordecodeSingleDictionaryId— enabling full inlining of the type-specific decode expression. The default interface method's bytecode is shared by all ~30 implementors, producing a megamorphic profile that prevents inlining.Class hierarchy of the change:
Why are the changes needed?
The default
decodeDictionaryIdsmethod has two performance issues:Unconditional
isNullAtcheck: Even when the column has no nulls (common case), every element pays for anisNullAt(i)call.WritableColumnVector.hasNull()is an O(1) flag check that allows skipping the per-element null check entirely.Megamorphic dispatch: Java interface default methods compile to a single bytecode shared by all implementors. C2 profiles one call site for
decodeSingleDictionaryIdacross ~30 updater types → megamorphic → no inlining. Per-class overrides create per-class bytecode → per-class C2 profiles → monomorphic devirtualization → full inlining of the decode expression.Benchmark results on AMD EPYC 9V45 (1M rows, dict size 100, Rate M/s higher is better):
The no-nulls case is the common production path and shows a clear improvement. With nulls present the
isNullAtcheck dominates regardless, so performance is neutral.Does this PR introduce any user-facing change?
No.
How was this patch tested?
ParquetVectorUpdaterSuite(30 tests),ParquetQuerySuite,ParquetIOSuite,ParquetEncodingSuite,VectorizedRleValuesReaderSuite,ParquetSchemaSuite(243 tests total) — all pass.ParquetDictionaryDecodeBenchmarkwith global pre-warm that interleaves bothhasNull()branches (no-null and 50%-null) across all 6 updater types before measurement, avoiding C2 uncommon-trap bias. Three benchmark groups: no nulls, 10% nulls, 50% nulls.Was this patch authored or co-authored using generative AI tooling?
Generated-by: OpenCode (Claude claude-opus-4.6)