Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator #17820
Adding JsonIndexDistinctOperator and InvertedIndexDistinctOperator #17820raghavyadav01 wants to merge 1 commit intoapache:masterfrom
Conversation
9fe3f76 to
a986d91
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17820 +/- ##
============================================
- Coverage 63.25% 63.17% -0.08%
Complexity 1481 1481
============================================
Files 3190 3191 +1
Lines 192285 192487 +202
Branches 29470 29511 +41
============================================
- Hits 121630 121607 -23
- Misses 61118 61350 +232
+ Partials 9537 9530 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
...st/java/org/apache/pinot/integration/tests/InvertedIndexDistinctOperatorIntegrationTest.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
...st/java/org/apache/pinot/integration/tests/InvertedIndexDistinctOperatorIntegrationTest.java
Outdated
Show resolved
Hide resolved
5e9cc7e to
bc0d213
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds two new index-based distinct operators (JsonIndexDistinctOperator and InvertedIndexDistinctOperator) that avoid the scan-based projection pipeline for SELECT DISTINCT queries by reading values directly from JSON or inverted indexes. Both operators are disabled by default and opt-in via query options (useJsonIndexDistinct, useInvertedIndexDistinct, or the umbrella useIndexBasedDistinctOperator).
Changes:
- Two new operators (
JsonIndexDistinctOperator,InvertedIndexDistinctOperator) and their integration intoDistinctPlanNode's operator selection logic, plus query option plumbing inCommonConstantsandQueryOptionsUtils. - Integration tests in
JsonPathTestandOfflineClusterIntegrationTestvalidating correctness and index-only execution stats for both operators. - A new
isPathIndexeddefault method onJsonIndexReaderSPI interface, and an unrelated change toMultiStageWithoutStatsIntegrationTest.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
CommonConstants.java |
Adds three new query option keys for index-based distinct |
QueryOptionsUtils.java |
Adds parsing methods for the new query options with umbrella fallback |
JsonIndexReader.java |
Adds isPathIndexed default method for path indexing check |
DistinctPlanNode.java |
Integrates new operators into the plan selection logic |
JsonIndexDistinctOperator.java |
New operator using JSON index value→docId map for DISTINCT |
InvertedIndexDistinctOperator.java |
New operator using inverted index dictId→docIds for DISTINCT |
JsonPathTest.java |
Integration tests for JsonIndexDistinctOperator |
OfflineClusterIntegrationTest.java |
Integration tests for InvertedIndexDistinctOperator |
MultiStageWithoutStatsIntegrationTest.java |
Unrelated change replacing enum reference with string literal |
You can also share your feedback on Copilot code review. Take the survey.
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/pinot/integration/tests/MultiStageWithoutStatsIntegrationTest.java
Outdated
Show resolved
Hide resolved
| private JsonIndexReader getJsonIndexReader(String columnName) { | ||
| DataSource dataSource = _indexSegment.getDataSource(columnName, _queryContext.getSchema()); | ||
| JsonIndexReader reader = dataSource.getJsonIndex(); | ||
| if (reader == null) { | ||
| Optional<IndexType<?, ?, ?>> compositeIndex = | ||
| IndexService.getInstance().getOptional("composite_json_index"); | ||
| if (compositeIndex.isPresent()) { | ||
| reader = (JsonIndexReader) dataSource.getIndex(compositeIndex.get()); | ||
| } | ||
| } | ||
| return reader; |
There was a problem hiding this comment.
The logic to locate a JsonIndexReader (checking getJsonIndex() then falling back to composite_json_index) is duplicated between canUseJsonIndexDistinct() (lines 297-306) and getJsonIndexReader() (lines 148-158). Consider extracting this into a single shared static method to avoid the duplication diverging over time.
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
bc0d213 to
57dfc5b
Compare
57dfc5b to
a69679e
Compare
a69679e to
92561b5
Compare
14de178 to
9a606d4
Compare
… columns Introduce a new operator that resolves SELECT DISTINCT jsonExtractIndex(...) queries directly from the JSON index's value-to-docId map, bypassing the document scan and projection/transform pipeline entirely. This is opt-in via the query option `useIndexBasedDistinctOperator=true`. Key changes: - New JsonIndexDistinctOperator that reads distinct values from the JSON index, supports type-aware distinct tables (INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING), ORDER BY, LIMIT, and filter pushdown - DistinctPlanNode routes to JsonIndexDistinctOperator when the query option is enabled and a single jsonExtractIndex expression has a backing JSON index - JsonIndexReader.isPathIndexed() default method to check path availability (always true for OSS JSON index, selective for composite) - New query option USE_INDEX_BASED_DISTINCT_OPERATOR in CommonConstants with corresponding helpers in QueryOptionsUtils - Integration tests in JsonPathTest verifying correctness against baseline, filter support, and zero numEntriesScannedPostFilter in SSE Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9a606d4 to
f4c8308
Compare
the commit contains following changes: