Skip to content

Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator #17820

Open
raghavyadav01 wants to merge 1 commit intoapache:masterfrom
raghavyadav01:json-index-distinct-operator
Open

Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator #17820
raghavyadav01 wants to merge 1 commit intoapache:masterfrom
raghavyadav01:json-index-distinct-operator

Conversation

@raghavyadav01
Copy link
Collaborator

the commit contains following changes:

  • Adding support for JsonIndexDistinctOperator and InvertedIndexDistinctOperator using the filteroiperator to avoid scan in DistinctOperator
  • Integration test to validate the changes.
  • Unit Tests
  • operators are disabled by default and can be enabled by query Options.

@raghavyadav01 raghavyadav01 force-pushed the json-index-distinct-operator branch from 9fe3f76 to a986d91 Compare March 5, 2026 05:22
@raghavyadav01 raghavyadav01 changed the title Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator [DRAFT]: Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator Mar 5, 2026
@codecov-commenter
Copy link

codecov-commenter commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 0.99010% with 200 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.17%. Comparing base (6d30490) to head (f4c8308).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...core/operator/query/JsonIndexDistinctOperator.java 0.00% 194 Missing ⚠️
...a/org/apache/pinot/core/plan/DistinctPlanNode.java 0.00% 4 Missing and 1 partial ⚠️
...inot/segment/spi/index/reader/JsonIndexReader.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17820      +/-   ##
============================================
- Coverage     63.25%   63.17%   -0.08%     
  Complexity     1481     1481              
============================================
  Files          3190     3191       +1     
  Lines        192285   192487     +202     
  Branches      29470    29511      +41     
============================================
- Hits         121630   121607      -23     
- Misses        61118    61350     +232     
+ Partials       9537     9530       -7     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 ?
java-11 63.14% <0.99%> (-0.09%) ⬇️
java-21 63.15% <0.99%> (-0.06%) ⬇️
temurin 63.17% <0.99%> (-0.08%) ⬇️
unittests 63.17% <0.99%> (-0.08%) ⬇️
unittests1 55.48% <0.99%> (-0.09%) ⬇️
unittests2 34.22% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new index-based distinct operators (JsonIndexDistinctOperator and InvertedIndexDistinctOperator) that avoid the scan-based projection pipeline for SELECT DISTINCT queries by reading values directly from JSON or inverted indexes. Both operators are disabled by default and opt-in via query options (useJsonIndexDistinct, useInvertedIndexDistinct, or the umbrella useIndexBasedDistinctOperator).

Changes:

  • Two new operators (JsonIndexDistinctOperator, InvertedIndexDistinctOperator) and their integration into DistinctPlanNode's operator selection logic, plus query option plumbing in CommonConstants and QueryOptionsUtils.
  • Integration tests in JsonPathTest and OfflineClusterIntegrationTest validating correctness and index-only execution stats for both operators.
  • A new isPathIndexed default method on JsonIndexReader SPI interface, and an unrelated change to MultiStageWithoutStatsIntegrationTest.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
CommonConstants.java Adds three new query option keys for index-based distinct
QueryOptionsUtils.java Adds parsing methods for the new query options with umbrella fallback
JsonIndexReader.java Adds isPathIndexed default method for path indexing check
DistinctPlanNode.java Integrates new operators into the plan selection logic
JsonIndexDistinctOperator.java New operator using JSON index value→docId map for DISTINCT
InvertedIndexDistinctOperator.java New operator using inverted index dictId→docIds for DISTINCT
JsonPathTest.java Integration tests for JsonIndexDistinctOperator
OfflineClusterIntegrationTest.java Integration tests for InvertedIndexDistinctOperator
MultiStageWithoutStatsIntegrationTest.java Unrelated change replacing enum reference with string literal

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +148 to +158
private JsonIndexReader getJsonIndexReader(String columnName) {
DataSource dataSource = _indexSegment.getDataSource(columnName, _queryContext.getSchema());
JsonIndexReader reader = dataSource.getJsonIndex();
if (reader == null) {
Optional<IndexType<?, ?, ?>> compositeIndex =
IndexService.getInstance().getOptional("composite_json_index");
if (compositeIndex.isPresent()) {
reader = (JsonIndexReader) dataSource.getIndex(compositeIndex.get());
}
}
return reader;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to locate a JsonIndexReader (checking getJsonIndex() then falling back to composite_json_index) is duplicated between canUseJsonIndexDistinct() (lines 297-306) and getJsonIndexReader() (lines 148-158). Consider extracting this into a single shared static method to avoid the duplication diverging over time.

Copilot uses AI. Check for mistakes.
@raghavyadav01 raghavyadav01 force-pushed the json-index-distinct-operator branch from bc0d213 to 57dfc5b Compare March 12, 2026 00:58
@xiangfu0 xiangfu0 force-pushed the json-index-distinct-operator branch from 57dfc5b to a69679e Compare March 12, 2026 19:41
@raghavyadav01 raghavyadav01 force-pushed the json-index-distinct-operator branch from a69679e to 92561b5 Compare March 14, 2026 18:53
@raghavyadav01 raghavyadav01 changed the title [DRAFT]: Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator Mar 14, 2026
@xiangfu0 xiangfu0 force-pushed the json-index-distinct-operator branch from 14de178 to 9a606d4 Compare March 15, 2026 09:29
… columns

Introduce a new operator that resolves SELECT DISTINCT jsonExtractIndex(...)
queries directly from the JSON index's value-to-docId map, bypassing the
document scan and projection/transform pipeline entirely. This is opt-in
via the query option `useIndexBasedDistinctOperator=true`.

Key changes:
- New JsonIndexDistinctOperator that reads distinct values from the JSON
  index, supports type-aware distinct tables (INT, LONG, FLOAT, DOUBLE,
  BIG_DECIMAL, STRING), ORDER BY, LIMIT, and filter pushdown
- DistinctPlanNode routes to JsonIndexDistinctOperator when the query
  option is enabled and a single jsonExtractIndex expression has a
  backing JSON index
- JsonIndexReader.isPathIndexed() default method to check path
  availability (always true for OSS JSON index, selective for composite)
- New query option USE_INDEX_BASED_DISTINCT_OPERATOR in CommonConstants
  with corresponding helpers in QueryOptionsUtils
- Integration tests in JsonPathTest verifying correctness against
  baseline, filter support, and zero numEntriesScannedPostFilter in SSE

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xiangfu0 xiangfu0 force-pushed the json-index-distinct-operator branch from 9a606d4 to f4c8308 Compare March 15, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants