[SPARK-56893][SQL] Optimize Parquet dictionary decoding with hasNull fast path and per-class updater overrides by iemejia · Pull Request #55920 · apache/spark

iemejia · 2026-05-16T20:42:41Z

What changes were proposed in this pull request?

This PR adds two optimizations to the Parquet vectorized dictionary decode path (ParquetVectorUpdater.decodeDictionaryIds):

hasNull() fast path: A new static decodeBatch helper on ParquetVectorUpdater splits decoding into two loops — when values.hasNull() is false, the per-element isNullAt(i) check is skipped entirely.
Per-class decodeDictionaryIds overrides in six hot-path updaters (IntegerUpdater, IntegerToLongUpdater, LongUpdater, FloatUpdater, FloatToDoubleUpdater, DoubleUpdater): each override is a one-line delegation to decodeBatch(... this). Although the logic is identical to the default method, each per-class override gives the C2 JIT compiler its own bytecode — and therefore a monomorphic call site for decodeSingleDictionaryId — enabling full inlining of the type-specific decode expression. The default interface method's bytecode is shared by all ~30 implementors, producing a megamorphic profile that prevents inlining.

Class hierarchy of the change:

ParquetVectorUpdater (interface)
├── default decodeDictionaryIds(...)  →  delegates to static decodeBatch(... this)
├── static decodeBatch(...)           →  hasNull() branch + two loops calling updater.decodeSingleDictionaryId()
│
└── Concrete updaters in ParquetVectorUpdaterFactory:
    ├── IntegerUpdater         @Override decodeDictionaryIds → decodeBatch(... this)
    ├── IntegerToLongUpdater   @Override decodeDictionaryIds → decodeBatch(... this)
    ├── LongUpdater            @Override decodeDictionaryIds → decodeBatch(... this)
    ├── FloatUpdater           @Override decodeDictionaryIds → decodeBatch(... this)
    ├── FloatToDoubleUpdater   @Override decodeDictionaryIds → decodeBatch(... this)
    └── DoubleUpdater          @Override decodeDictionaryIds → decodeBatch(... this)

Why are the changes needed?

The default decodeDictionaryIds method has two performance issues:

Unconditional isNullAt check: Even when the column has no nulls (common case), every element pays for an isNullAt(i) call. WritableColumnVector.hasNull() is an O(1) flag check that allows skipping the per-element null check entirely.
Megamorphic dispatch: Java interface default methods compile to a single bytecode shared by all implementors. C2 profiles one call site for decodeSingleDictionaryId across ~30 updater types → megamorphic → no inlining. Per-class overrides create per-class bytecode → per-class C2 profiles → monomorphic devirtualization → full inlining of the decode expression.

Benchmark results on AMD EPYC 9V45 (1M rows, dict size 100, Rate M/s higher is better):

Scenario	Upstream	Optimized	Speedup
No nulls (avg across 6 updaters)	~332 M/s	~412 M/s	1.24x
10% nulls	~284 M/s	~277 M/s	~1.0x (neutral)
50% nulls	~180 M/s	~181 M/s	~1.0x (neutral)

The no-nulls case is the common production path and shows a clear improvement. With nulls present the isNullAt check dominates regardless, so performance is neutral.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests: ParquetVectorUpdaterSuite (30 tests), ParquetQuerySuite, ParquetIOSuite, ParquetEncodingSuite, VectorizedRleValuesReaderSuite, ParquetSchemaSuite (243 tests total) — all pass.
New benchmark: ParquetDictionaryDecodeBenchmark with global pre-warm that interleaves both hasNull() branches (no-null and 50%-null) across all 6 updater types before measurement, avoiding C2 uncommon-trap bias. Three benchmark groups: no nulls, 10% nulls, 50% nulls.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenCode (Claude claude-opus-4.6)

…tionary decoding Add a static decodeBatch helper on ParquetVectorUpdater that splits dictionary ID decoding into two loops: one when values.hasNull() is false (skipping per-element isNullAt checks entirely) and one when nulls are present. Six hot-path updaters (IntegerUpdater, IntegerToLongUpdater, LongUpdater, FloatUpdater, FloatToDoubleUpdater, DoubleUpdater) override decodeDictionaryIds with a one-line delegation to decodeBatch. Although the logic is identical to the default method, each per-class override gives the C2 JIT compiler a monomorphic call site for decodeSingleDictionaryId, enabling full inlining of the type-specific decode expression. The default interface method's bytecode is shared by all ~30 implementors, producing a megamorphic profile that prevents inlining. Benchmark results on AMD EPYC 9V45 (1M rows, dict size 100): - No nulls: ~1.23-1.27x throughput improvement - With nulls: neutral (isNullAt dominates regardless) Also adds ParquetDictionaryDecodeBenchmark with global pre-warm that interleaves both hasNull() branches to produce fair C2 profiles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56893][SQL] Optimize Parquet dictionary decoding with hasNull fast path and per-class updater overrides#55920

[SPARK-56893][SQL] Optimize Parquet dictionary decoding with hasNull fast path and per-class updater overrides#55920
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-56893-parquet-dict-decode-fast-path

iemejia commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iemejia commented May 16, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant