[SPARK-56942][SQL] Widen DSv2 row-id resolution to support nested columns by xupefei · Pull Request #55981 · apache/spark

xupefei · 2026-05-19T12:23:31Z

What changes were proposed in this pull request?

DSv2 connectors that implement SupportsDelta currently must use a top-level column for rowId(). If a connector returns a multi-segment field reference (e.g. a nested struct field, or _metadata.row_index on a file-source-backed table), analysis fails with a ClassCastException because Spark calls V2ExpressionUtils.resolveRefs[AttributeReference], while nested references resolve to Alias(GetStructField(...)).

Widen RewriteRowLevelCommand.resolveRowIdAttrs and WriteDelta.rowIdAttrsResolved to resolve as NamedExpression and flatten back via .toAttribute. This supports both flat and nested row-id columns; flat-column behavior is unchanged.

Why are the changes needed?

To unblocks Delta Lake DSv2 connectors that identify rows by file-source metadata such as (_metadata.file_path, _metadata.row_index).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests are added to make sure it works for nested and flat cases.

Was this patch authored or co-authored using generative AI tooling?

Yes, generated by Claude.

…umns DSv2 connectors that implement SupportsDelta currently must use a top-level column for `rowId()`. If a connector returns a multi-segment field reference (e.g. a nested struct field, or `_metadata.row_index` on a file-source-backed table), analysis fails with a ClassCastException because Spark calls `V2ExpressionUtils.resolveRefs[AttributeReference]`, while nested references resolve to `Alias(GetStructField(...))`. Widen `RewriteRowLevelCommand.resolveRowIdAttrs` and `WriteDelta.rowIdAttrsResolved` to resolve as `NamedExpression` and flatten back via `.toAttribute`. This supports both flat and nested row-id columns; flat-column behavior is unchanged. Tests added in `V2ExpressionUtilsSuite` cover the nested case, the flat case, and demonstrate that the previous `[AttributeReference]` cast would throw for nested references. Co-authored-by: Isaac

xupefei added 2 commits May 18, 2026 14:10

Trigger CI

8498701

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56942][SQL] Widen DSv2 row-id resolution to support nested columns#55981

[SPARK-56942][SQL] Widen DSv2 row-id resolution to support nested columns#55981
xupefei wants to merge 2 commits into
apache:masterfrom
xupefei:widen-rowid-namedexpression

xupefei commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xupefei commented May 19, 2026

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant