Fix GROUP BY DISTINCTCOUNT ClassCastException in mergeDataTablesOnly#18842
Draft
navina wants to merge 2 commits into
Draft
Fix GROUP BY DISTINCTCOUNT ClassCastException in mergeDataTablesOnly#18842navina wants to merge 2 commits into
navina wants to merge 2 commits into
Conversation
mergeDataTablesOnly reused getIndexedTable, which unconditionally called indexedTable.finish(true, true). For a GROUP BY query with an OBJECT-typed intermediate aggregate (DISTINCTCOUNT, DISTINCTCOUNTHLL, etc.), that storeFinalResult=true call: - mutated DataSchema._columnDataTypes from OBJECT to the final-result type in place (IndexedTable.java:170-174), and - replaced each row's value with extractFinalResult(value) (Set → Integer for DISTINCTCOUNT) at IndexedTable.java:195. The subsequent buildIntermediateDataTable read DataSchema's cached _storedColumnDataTypes — populated earlier from the pre-finalize OBJECT schema and never invalidated by finish — so the OBJECT branch fired and handed the now-Integer value into BaseDistinctAggregateAggregationFunction.serializeIntermediateResult(Set), which threw ClassCastException. Fix: move finish() out of getIndexedTable and let each caller pick the right mode. reduceResult keeps finish(true, true) (downstream consumers expect final scalars). mergeDataTablesOnly calls finish(true, false) so aggregate values stay as intermediates and round-trip through buildIntermediateDataTable correctly. Adds a testGroupByDistinctCountObjectRoundTrip regression: server emits intermediate OBJECT-encoded Sets, merge produces a re-injectable intermediate DataTable, and reducing that intermediate matches a direct reduce of the same servers.
02d46d8 to
af6f235
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18842 +/- ##
=========================================
Coverage 64.77% 64.77%
Complexity 1322 1322
=========================================
Files 3393 3393
Lines 211022 211023 +1
Branches 33135 33135
=========================================
+ Hits 136687 136695 +8
+ Misses 63322 63310 -12
- Partials 11013 11018 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Previous run hit a Maven local-repo lock-contention failure in Unit Test Set 2 (annotation-processor classpath resolution timed out at 900s on hk2 / immutables artifacts). Unrelated to the code change in this PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
BrokerReduceServiceexposes two reduce paths:reduceResult(...)— the normal path: merge per-server results and finalize.mergeDataTablesOnly(...)— merge-only: produce an intermediateDataTablethat can be re-merged later via the normal reduce path. Used by callers that need to combine partial scatters without finalizing.The merge-only path's contract is that aggregate values stay as intermediates end-to-end.
What's wrong
OSS's single-stage broker request handler auto-sets
SERVER_RETURN_FINAL_RESULT=truewhen the routing table targets exactly one server (BaseSingleStageBrokerRequestHandler.java:844). That optimization causes servers to return finalized scalars (e.g.IntegerforDISTINCTCOUNT,LongforDISTINCTCOUNTHLL) instead of intermediate sketches — which directly contradicts the merge-only contract: the input is already final-typed, not intermediate.Two ways this manifested in
GroupByDataTableReducer.mergeDataTablesOnly:No guard for the single-server-final-result mode. When
_queryContext.isServerReturnFinalResult()is true, the input DataTables hold final values, so merge-only cannot honor its contract. The previous code silently proceeded and produced a malformed intermediate.getIndexedTableover-finalized internally. Even when servers correctly returned intermediates, the shared helper unconditionally calledindexedTable.finish(true, true). For aGROUP BYquery with an OBJECT-typed intermediate aggregate (DISTINCTCOUNT,DISTINCTCOUNTHLL, etc.),storeFinalResult=true:DataSchema._columnDataTypesfromOBJECTto the final-result type in place (IndexedTable.java:170-174), andextractFinalResult(value)(Set → IntegerforDISTINCTCOUNT) atIndexedTable.java:195.The subsequent
buildIntermediateDataTablereadDataSchema's lazily-populated_storedColumnDataTypes(populated earlier from the pre-finalize OBJECT schema and never invalidated byfinish), so the OBJECT branch fired and handed the now-Integervalue intoBaseDistinctAggregateAggregationFunction.serializeIntermediateResult(Set), which threwClassCastException.Fix
mergeDataTablesOnly: throwUnsupportedOperationExceptionwith a clear message namingSERVER_RETURN_FINAL_RESULT. Callers expecting an intermediate must either disable this option or usereduceResultinstead.finish()out ofgetIndexedTableand let each caller pick the right mode:reduceResultkeepsfinish(true, true)— downstream consumers expect final scalars.mergeDataTablesOnlycallsfinish(true, false)so aggregate values stay as intermediates and round-trip throughbuildIntermediateDataTablecorrectly.Tests
Adds
testGroupByDistinctCountObjectRoundTripinMergeDataTablesOnlyTest: servers emit intermediate OBJECT-encodedSets, merge produces a re-injectable intermediateDataTable, and reducing that intermediate matches a direct reduce of the same per-server inputs.Release Notes
Fixes a
ClassCastExceptioninmergeDataTablesOnlywhen mergingGROUP BYresults that include OBJECT-typed intermediate aggregates (e.g.DISTINCTCOUNT,DISTINCTCOUNTHLL).mergeDataTablesOnlynow also throwsUnsupportedOperationExceptionup front when called against a query whose servers will return final results (e.g. the single-serverSERVER_RETURN_FINAL_RESULTauto-set), instead of silently producing a malformed intermediate.