[multistage] add maxRowsInJoin, maxRowsInWindow, numGroups to query response#17784
[multistage] add maxRowsInJoin, maxRowsInWindow, numGroups to query response#17784dang-stripe wants to merge 2 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17784 +/- ##
============================================
+ Coverage 63.17% 63.27% +0.09%
- Complexity 1454 1460 +6
============================================
Files 3176 3190 +14
Lines 191025 192067 +1042
Branches 29206 29421 +215
============================================
+ Hits 120688 121530 +842
- Misses 60920 61024 +104
- Partials 9417 9513 +96
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I would like to look for other approaches instead of this one. I have the feeling we are bloating up the response with too many attributes that are either circumstancial or derivable from the query stats. I understand this means to move the processing to the client side, which may not be the best, but we can have other alternatives like some optional response decorator that takes tle stageStats and create new attributes based on that. With a system like this you can create your own decorators without increasing the already too large metadata response |
|
@gortiz replicating the logic for how these limits are enforced on the client seems like a significant burden. i believe stage stats is not a stable API yet either meaning this can break across releases. if that is the expectation, ideally there should be a canonical library for processing stage stats. i feel that the current approach supplements the existing fields in the query response ( |
I wouldn't say so. That API is stable and publicly documented (see each operator in https://docs.pinot.apache.org/users/user-guide-query/multi-stage-query/operator-types). There may be cases where the broker cannot reconstruct stageStats, and that is something we want to improve, but I don't recall any case where stats were renamed or removed. In fact, removing entries from the Java enum is explicitly forbidden.
Historically, we added a new entry in the response for whatever requirement someone has. As a result, today we have dozens of entries that are not very well-defined and verbose in code (requiring getters and setters for each new property). |
gortiz
left a comment
There was a problem hiding this comment.
Approving it as it has been proven useful for Stripe. Before merging it, @dang-stripe can you clean up the code so we don't include the getters to the interface and BrokerResponseNative (V1)?
| /** | ||
| * Returns the max number of rows seen across all join operators. | ||
| */ | ||
| long getMaxRowsInJoin(); | ||
|
|
There was a problem hiding this comment.
Do we need to add these methods to the interface? I don't see any call to this, getMaxRowsInWindow or getNumGroups and it is not actually necessary for jackson.
For example, getMaxRowsInOperator() is defined only for BrokerNativeV2.
|
Thanks @gortiz! I've updated the PR to remove the fields from the interface and V1 response. Re-tested it and confirmed no change. |
summary
this addresses #17565
maxRowsInJoin,maxRowsInWindow,numGroupsto the query response so we can monitor how close queries are to their respective limits before queries start failing (throw overflow mode) or returning partial results (break overflow mode)numGroupsfor single stage.cc @Jackie-Jiang @gortiz @suvodeep-pyne
testing
exchange_ratesis a dim table.