Skip to content

Prune unused passthrough columns from UNNEST output (opt-in)#18782

Open
gortiz wants to merge 1 commit into
apache:masterfrom
gortiz:unnest-prune-passthrough
Open

Prune unused passthrough columns from UNNEST output (opt-in)#18782
gortiz wants to merge 1 commit into
apache:masterfrom
gortiz:unnest-prune-passthrough

Conversation

@gortiz

@gortiz gortiz commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in, default-off query option unnestColumnPruning for the multi-stage engine that prunes input/passthrough columns — notably the unnested source array — from the UNNEST output when nothing downstream references them.

Today, for a query like:

SELECT e.col1, u.s FROM e CROSS JOIN UNNEST(e.mcol1) AS u(s)

the UnnestNode output schema is the full Calcite Correlate row type [col1, mcol1, s], and UnnestOperator copies the entire input row (including the source array mcol1) into every one of the N exploded rows — only for a parent Project to immediately drop mcol1. For large arrays this needlessly widens every intermediate row (and serializes the array N times when an exchange sits between the UNNEST and the projecting operator).

The array expression is only needed in the operator's input (to evaluate the explode), never in its output, unless the user also selects it. With the flag on, the source array is no longer carried.

How it works

  • UnnestNode carries a passthrough index map + prunedPassthrough flag. Legacy constructors default to not pruned (copy the whole input row), so existing behavior is unchanged.
  • RelToPlanNodeConverter fuses the pruning into convertLogicalProject: when a Project sits directly above a Correlate/Uncollect (no wrapping correlate-filter), it computes which left columns the project actually references, builds a pruned UnnestNode (smaller output schema + passthrough map + recomputed element/ordinality indexes), and remaps that one project's InputRefs. It falls back to the current behavior in every other shape, and converts the correlate at most once.
  • UnnestOperator honors the passthrough map, copying only retained columns (resolved to a primitive int[] so the per-row hot path stays allocation/box-free). When not pruned, it keeps the legacy whole-row System.arraycopy.

Backward compatibility / rolling upgrades

UnnestNode is serialized broker→server and the operator runs server-side, so this is a two-sided change. The proto fields (passthroughInputIndexes, prunedPassthrough) are additive:

  • old broker → new server: fields absent ⇒ prunedPassthrough=false ⇒ legacy "copy whole row". Safe.
  • new broker → old server: a smaller output schema would break an un-upgraded server. This is exactly why the option defaults to off — a new broker never emits the pruned schema until an operator explicitly enables it, which should only happen once the whole server fleet is upgraded. A future release can flip the default once a minimum server version is guaranteed.

Tests

  • UnnestSqlPlannerTest — flag on/off, source-array-also-selected (no-op), WITH ORDINALITY, multiple arrays, zero-passthrough.
  • PlanNodeSerDeTest — protobuf round-trip for pruned (non-sequential indexes + ordinality) and legacy UnnestNode.
  • UnnestOperatorTest — pruned single-array, zero-passthrough, WITH ORDINALITY, multiple arrays; legacy path unchanged.
  • UnnestIntegrationTest — end-to-end pruned-vs-default result equality across single/multi array, WITH ORDINALITY, zero-passthrough, and array-also-selected shapes.

Notes for reviewers

  • UNNEST is only supported on the logical planner path, so the change is confined to RelToPlanNodeConverter + the node/operator + serde; the V2 physical path is untouched.
  • Default-off; enabling it is a rolling-upgrade-ordering decision (servers before brokers).

@codecov-commenter

codecov-commenter commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.07792% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.82%. Comparing base (b27a3ad) to head (f21dc95).
⚠️ Report is 16 commits behind head on master.

Files with missing lines Patch % Lines
.../query/planner/logical/RelToPlanNodeConverter.java 63.71% 25 Missing and 16 partials ⚠️
...pache/pinot/query/planner/plannode/UnnestNode.java 85.71% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18782      +/-   ##
============================================
+ Coverage     64.78%   64.82%   +0.04%     
  Complexity     1309     1309              
============================================
  Files          3380     3380              
  Lines        209544   209773     +229     
  Branches      32797    32861      +64     
============================================
+ Hits         135746   135979     +233     
+ Misses        62870    62840      -30     
- Partials      10928    10954      +26     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.82% <72.07%> (+0.04%) ⬆️
temurin 64.82% <72.07%> (+0.04%) ⬆️
unittests 64.81% <72.07%> (+0.04%) ⬆️
unittests1 57.01% <72.07%> (+0.06%) ⬆️
unittests2 37.23% <4.54%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add an opt-in (default-off) query option `unnestColumnPruning` for the
multi-stage engine. When enabled, a Project sitting directly above a
CROSS JOIN UNNEST has its unreferenced input/passthrough columns -
notably the unnested source array - dropped from the UnnestNode output,
so the operator no longer copies them into every exploded row.

- UnnestNode carries a passthrough index map + prunedPassthrough flag
  (legacy constructors default to "copy whole row").
- RelToPlanNodeConverter fuses the pruning into convertLogicalProject:
  computes the referenced left columns, builds a pruned UnnestNode with
  recomputed element/ordinality indexes, and remaps the project refs.
  Falls back to current behavior in every other shape.
- UnnestOperator honors the passthrough map (primitive int[] hot path).
- Additive protobuf fields keep old-broker->new-server safe; the flag
  defaults off so a new broker never emits the smaller schema to an
  un-upgraded server (enable only after the whole fleet is upgraded).

Covered by planner, serde round-trip, operator, and integration tests
(single/multi array, WITH ORDINALITY, zero-passthrough, array-also-selected).
@gortiz gortiz force-pushed the unnest-prune-passthrough branch from 7af66f6 to f21dc95 Compare June 17, 2026 08:48
@yashmayya yashmayya added enhancement Improvement to existing functionality performance Related to performance optimization multi-stage Related to the multi-stage query engine labels Jun 24, 2026

@yashmayya yashmayya left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the correctness of the pruning and the rolling-upgrade story in depth; both hold up well. Things I specifically validated:

  • The pruned output schema and the remapped project InputRefs are built from the same leftCount + k → numRetained + k mapping, so they can't drift out of sync, and buildPrunedProject correctly falls back to the unpruned conversion for any non-standard layout (correlate-filter present, all passthrough referenced, non-trailing element region, etc.).
  • collectReferencedColumns / remapInputRefs cover every InputRef-bearing node: RexExpression is only InputRef/Literal/FunctionCall, and the project expressions always come through the scalar fromRexCall path, so the 3-arg FunctionCall rebuild is lossless (isDistinct/ignoreNulls are only ever set by the aggregate/window paths, which never feed a Project) — same idiom as the existing rewriteInputRefs helpers.
  • Operator hot path is correct and the legacy branch is byte-for-byte unchanged; the proto change is additive with the right proto3 defaults. old-broker→new-server stays on the legacy whole-row copy; new-broker(flag-on)→old-server is the only thing that breaks (old server arraycopies the full input row into the smaller pruned out), which is exactly why default-off is the right call.

No blockers from me.

One optional, non-blocking thought on enablement: pruneUnnestColumns is query-option-only, whereas the sibling useSpools just above it also honors a broker-config default (_envConfig.defaultUseSpools()). Wiring a matching defaultUnnestColumnPruning would let an operator flip this on once cluster-wide after the fleet is upgraded, instead of needing SET unnestColumnPruning=true on every query — and it'd make the eventual default flip a one-line constant change. Might also be worth a javadoc note that the option is a no-op under usePhysicalOptimizer (the v2 path doesn't go through RelToPlanNodeConverter). Neither blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality multi-stage Related to the multi-stage query engine performance Related to performance optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants