[WIP] HIVE-28931: Convert IN to SEARCH in RexNodeConverter#6495
[WIP] HIVE-28931: Convert IN to SEARCH in RexNodeConverter#6495rubenada wants to merge 4 commits into
Conversation
zabetak
left a comment
There was a problem hiding this comment.
Few quick comments as I skimmed through the PR.
| if (childRexNodeLst.size() == 2) { | ||
| return rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, childRexNodeLst); | ||
| } |
There was a problem hiding this comment.
Should this become part of RexBuilder#makeIn API if not already the case?
There was a problem hiding this comment.
I don't know... Calcite's code just creates an IN with one element in this case, which is not incorrect (and I guess it's the expected behavior). I've just kept this special case on Hive's side for the moment to minimize side effects by this refactoring.
| // Calcite SEARCH conversion was not possible: generate our own OR expression | ||
| List<RexNode> newInputs = RexNodeConverter.transformInToOrOperands(childRexNodeLst, rexBuilder); | ||
| if (newInputs == null) { | ||
| return null; | ||
| } | ||
| return newInputs.size() == 1 ? newInputs.get(0) : rexBuilder.makeCall(SqlStdOperatorTable.OR, newInputs); |
There was a problem hiding this comment.
The RexBuilder#makeIn method also generates an OR when it cannot create a SEARCH. Is there any reason of why we opt for the Hive specific IN to OR transformation?
There was a problem hiding this comment.
Hive's OR conversion is more thorough: it checks HiveCalciteUtil.isDeterministic (and bails out if any expression is not), and also includes a field by field comparison for ROW types. Calcite's code is simply a "blind" OR of EQUALS
| } | ||
|
|
||
| // TODO remove? | ||
| public static List<RexNode> rewriteInClauseChildren(SqlOperator op, List<RexNode> childRexNodeLst, |
There was a problem hiding this comment.
If not use then yes lets drop it.
| // the threshold configured | ||
| childRexNodeLst = rewriteInClauseChildren(calciteOp, childRexNodeLst, rexBuilder); | ||
| calciteOp = SqlStdOperatorTable.OR; | ||
| if (childRexNodeLst.size() == 2 || RexUtil.isReferenceOrAccess(childRexNodeLst.get(0), true)) { |
There was a problem hiding this comment.
Why do we limit the transformation? What are the pros/cons of doing an unconditional rewrite?
There was a problem hiding this comment.
Just keeping the original logic to minimize the side effects of the change. I'll investigate why this condition was added in the first place and whether it can be removed...
| if (rewritten != null) { | ||
| assert rewritten instanceof RexCall; | ||
| RexCall call = (RexCall) rewritten; | ||
| if (call.getKind() == SqlKind.OR && maxNodesForInToOrTransformation != 0) { | ||
| // Rewrite to OR is done only if number of operands are less than the threshold configured | ||
| if (call.getOperands().size() <= maxNodesForInToOrTransformation) { |
There was a problem hiding this comment.
It's strange to have threshold blockers after performing the transformation. I think the initial purpose of the threshold was to prevent a performance hit and the exponential explosion of disjunctions generated by the transformation.
As part of this PR we should determine what happens with hive.optimize.transform.in.maxnodes property. Is it still relevant/necessary or are we planning to deprecate and remove it eventually?
There was a problem hiding this comment.
I agree, I'll try to move this logic into RexNodeConverter (makes more sense in there), to perform the check before actually generating the OR.
I guess we should keep this "maxnodes check" since this property was introduced due to perf issues (HIVE-22074: Slow compilation due to IN to OR transformation)
|



What changes were proposed in this pull request?
In RexNodeConverter, convert IN to SEARCH operator.
Why are the changes needed?
Doing an early conversion from IN to SEARCH (instead of OR) in RexNodeConverter seems beneficial for enabling further simplifications and it could possibly reduce compilation latency since we would avoid the intermediate conversion to OR when that is not necessary. In a nutshell instead of doing IN -> OR -> SEARCH we could do IN -> SEARCH and we could possibly exploit the RexBuilder.makeIn method.
Does this PR introduce any user-facing change?
No
How was this patch tested?
tbd