Skip to content

[SPARK-56893][SQL] Optimize Parquet dictionary decoding with hasNull fast path and per-class updater overrides#55920

Open
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-56893-parquet-dict-decode-fast-path
Open

[SPARK-56893][SQL] Optimize Parquet dictionary decoding with hasNull fast path and per-class updater overrides#55920
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-56893-parquet-dict-decode-fast-path

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented May 16, 2026

What changes were proposed in this pull request?

This PR adds two optimizations to the Parquet vectorized dictionary decode path (ParquetVectorUpdater.decodeDictionaryIds):

  1. hasNull() fast path: A new static decodeBatch helper on ParquetVectorUpdater splits decoding into two loops — when values.hasNull() is false, the per-element isNullAt(i) check is skipped entirely.

  2. Per-class decodeDictionaryIds overrides in six hot-path updaters (IntegerUpdater, IntegerToLongUpdater, LongUpdater, FloatUpdater, FloatToDoubleUpdater, DoubleUpdater): each override is a one-line delegation to decodeBatch(... this). Although the logic is identical to the default method, each per-class override gives the C2 JIT compiler its own bytecode — and therefore a monomorphic call site for decodeSingleDictionaryId — enabling full inlining of the type-specific decode expression. The default interface method's bytecode is shared by all ~30 implementors, producing a megamorphic profile that prevents inlining.

Class hierarchy of the change:

ParquetVectorUpdater (interface)
├── default decodeDictionaryIds(...)  →  delegates to static decodeBatch(... this)
├── static decodeBatch(...)           →  hasNull() branch + two loops calling updater.decodeSingleDictionaryId()
│
└── Concrete updaters in ParquetVectorUpdaterFactory:
    ├── IntegerUpdater         @Override decodeDictionaryIds → decodeBatch(... this)
    ├── IntegerToLongUpdater   @Override decodeDictionaryIds → decodeBatch(... this)
    ├── LongUpdater            @Override decodeDictionaryIds → decodeBatch(... this)
    ├── FloatUpdater           @Override decodeDictionaryIds → decodeBatch(... this)
    ├── FloatToDoubleUpdater   @Override decodeDictionaryIds → decodeBatch(... this)
    └── DoubleUpdater          @Override decodeDictionaryIds → decodeBatch(... this)

Why are the changes needed?

The default decodeDictionaryIds method has two performance issues:

  • Unconditional isNullAt check: Even when the column has no nulls (common case), every element pays for an isNullAt(i) call. WritableColumnVector.hasNull() is an O(1) flag check that allows skipping the per-element null check entirely.

  • Megamorphic dispatch: Java interface default methods compile to a single bytecode shared by all implementors. C2 profiles one call site for decodeSingleDictionaryId across ~30 updater types → megamorphic → no inlining. Per-class overrides create per-class bytecode → per-class C2 profiles → monomorphic devirtualization → full inlining of the decode expression.

Benchmark results on AMD EPYC 9V45 (1M rows, dict size 100, Rate M/s higher is better):

Scenario Upstream Optimized Speedup
No nulls (avg across 6 updaters) ~332 M/s ~412 M/s 1.24x
10% nulls ~284 M/s ~277 M/s ~1.0x (neutral)
50% nulls ~180 M/s ~181 M/s ~1.0x (neutral)

The no-nulls case is the common production path and shows a clear improvement. With nulls present the isNullAt check dominates regardless, so performance is neutral.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Existing tests: ParquetVectorUpdaterSuite (30 tests), ParquetQuerySuite, ParquetIOSuite, ParquetEncodingSuite, VectorizedRleValuesReaderSuite, ParquetSchemaSuite (243 tests total) — all pass.
  • New benchmark: ParquetDictionaryDecodeBenchmark with global pre-warm that interleaves both hasNull() branches (no-null and 50%-null) across all 6 updater types before measurement, avoiding C2 uncommon-trap bias. Three benchmark groups: no nulls, 10% nulls, 50% nulls.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenCode (Claude claude-opus-4.6)

…tionary decoding

Add a static decodeBatch helper on ParquetVectorUpdater that splits
dictionary ID decoding into two loops: one when values.hasNull() is
false (skipping per-element isNullAt checks entirely) and one when
nulls are present.

Six hot-path updaters (IntegerUpdater, IntegerToLongUpdater,
LongUpdater, FloatUpdater, FloatToDoubleUpdater, DoubleUpdater)
override decodeDictionaryIds with a one-line delegation to
decodeBatch. Although the logic is identical to the default method,
each per-class override gives the C2 JIT compiler a monomorphic call
site for decodeSingleDictionaryId, enabling full inlining of the
type-specific decode expression. The default interface method's
bytecode is shared by all ~30 implementors, producing a megamorphic
profile that prevents inlining.

Benchmark results on AMD EPYC 9V45 (1M rows, dict size 100):
- No nulls: ~1.23-1.27x throughput improvement
- With nulls: neutral (isNullAt dominates regardless)

Also adds ParquetDictionaryDecodeBenchmark with global pre-warm that
interleaves both hasNull() branches to produce fair C2 profiles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant