Skip to content

refactor: add VirtualColumns support for getting VirtualColumn dependency tree structure#19281

Open
clintropolis wants to merge 1 commit intoapache:masterfrom
clintropolis:refactor-virtual-columns-stuff
Open

refactor: add VirtualColumns support for getting VirtualColumn dependency tree structure#19281
clintropolis wants to merge 1 commit intoapache:masterfrom
clintropolis:refactor-virtual-columns-stuff

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Description

Follow-up to #19262, makes some stuff nicer.

changes:

  • add class VirtualColumns.Node capturing a VirtualColumn and its transitive VirtualColumn dependencies
  • add VirtualColumns.getNode method which takes a virtual column name and returns a VirtualColumns.Node from a memoized map supplier
  • modified VirtualColumns.findEquivalent to take a VirtualColumns.Node as an argument, replacing the previous two-arg findEquivalent(VirtualColumns, VirtualColumn), which iterates node.getDependencies() directly instead of calling virtualColumn.requiredColumns() + virtualColumns.getVirtualColumn() + null-checking, which simplifies both the implementation and all call sites
  • removed ShardVirtualColumnCacheEntry from FilterSegmentPruner, the shard equivalence cache now uses VirtualColumns.Node as the key instead of allocating a new tree-structure per call
  • Projections updated to use getNode() + findEquivalent(Node)
  • SegmentGenerationStageSpec method addRequiredVirtualColumns(VirtualColumns, VirtualColumn, Map) replaced by addRequiredFromNode(Node, Map) which walks getDependencies() of the node rather than manually calling requiredColumns() + getVirtualColumn

…ency tree structure

changes:
* add class `VirtualColumns.Node` capturing a `VirtualColumn` and its transitive `VirtualColumn` dependencies
* add `VirtualColumns.getNode` method which takes a virtual column name and returns a `VirtualColumns.Node` from a memoized map supplier
* modified `VirtualColumns.findEquivalent` to take a `VirtualColumns.Node` as an argument, replacing the previous two-arg `findEquivalent(VirtualColumns, VirtualColumn)`, which iterates `node.getDependencies()` directly instead of calling `virtualColumn.requiredColumns()` + `virtualColumns.getVirtualColumn()` + null-checking, which simplifies both the implementation and all call sites
* removed `ShardVirtualColumnCacheEntry` from `FilterSegmentPruner`, the shard equivalence cache now uses `VirtualColumns.Node` as the key instead of allocating a new tree-structure per call
* `Projections` updated to use `getNode()` + `findEquivalent(Node)`
* `SegmentGenerationStageSpec` method `addRequiredVirtualColumns(VirtualColumns, VirtualColumn, Map)` replaced by `addRequiredFromNode(Node, Map)` which walks `getDependencies()` of the node rather than manually calling `requiredColumns()` + `getVirtualColumn`
@github-actions github-actions bot added Area - Batch Ingestion Area - Segment Format and Ser/De Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Segment Format and Ser/De

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant