Describe the bug
When the common subexpression elimination deduplicates aggregations it can generate aliases for the common expression of the form __common_expr_<n>. In the logical plan explain output this gets output as <original expr> as __common_expr_<n>. In the physical plan explain output though only __common_expr_<n> is printed. The actual expression corresponding to this alias is no longer visible. This makes the explain output hard to interpret.
To Reproduce
Here's an example logic plan constructed using the data frame API. The problematic line is
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
Logical plan
============
Projection: idx, agg, ord
Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS agg, sum(column1) AS ord]]
Projection: column1, column2, CASE WHEN column2 <= Int64(0) THEN Int64(0) WHEN column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314) THEN Int64(3) ELSE Int64(4) END AS idx
Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3), Int64(314))
Optimized logical plan
======================
Projection: idx, __common_expr_1 AS agg, __common_expr_1 AS ord
Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS __common_expr_1]]
Projection: column1, CASE WHEN column2 <= Int64(0) THEN Int64(0) WHEN column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314) THEN Int64(3) ELSE Int64(4) END AS idx
Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3), Int64(314))
Physical plan
=============
ProjectionExec: expr=[idx@0 as idx, __common_expr_1@1 as agg, __common_expr_1@1 as ord]
AggregateExec: mode=FinalPartitioned, gby=[idx@0 as idx], aggr=[__common_expr_1]
RepartitionExec: partitioning=Hash([idx@0], 10), input_partitions=1
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
ProjectionExec: expr=[column1@0 as column1, CASE WHEN column2@1 <= 0 THEN 0 WHEN column2@1 <= 200 THEN 1 WHEN column2@1 <= 314 THEN 3 ELSE 4 END as idx]
DataSourceExec: partitions=1, partition_sizes=[1]
Expected behavior
Rather than
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
the explain output should show
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[sum(column1@0) as __common_expr_1]
similarly to how the group by expression are printed.
Additional context
No response
Describe the bug
When the common subexpression elimination deduplicates aggregations it can generate aliases for the common expression of the form
__common_expr_<n>. In the logical plan explain output this gets output as<original expr> as __common_expr_<n>. In the physical plan explain output though only__common_expr_<n>is printed. The actual expression corresponding to this alias is no longer visible. This makes the explain output hard to interpret.To Reproduce
Here's an example logic plan constructed using the data frame API. The problematic line is
Expected behavior
Rather than
the explain output should show
similarly to how the group by expression are printed.
Additional context
No response