Skip to content

[SPARK-56942][SQL] Widen DSv2 row-id resolution to support nested columns#55981

Open
xupefei wants to merge 2 commits into
apache:masterfrom
xupefei:widen-rowid-namedexpression
Open

[SPARK-56942][SQL] Widen DSv2 row-id resolution to support nested columns#55981
xupefei wants to merge 2 commits into
apache:masterfrom
xupefei:widen-rowid-namedexpression

Conversation

@xupefei
Copy link
Copy Markdown
Contributor

@xupefei xupefei commented May 19, 2026

What changes were proposed in this pull request?

DSv2 connectors that implement SupportsDelta currently must use a top-level column for rowId(). If a connector returns a multi-segment field reference (e.g. a nested struct field, or _metadata.row_index on a file-source-backed table), analysis fails with a ClassCastException because Spark calls V2ExpressionUtils.resolveRefs[AttributeReference], while nested references resolve to Alias(GetStructField(...)).

Widen RewriteRowLevelCommand.resolveRowIdAttrs and WriteDelta.rowIdAttrsResolved to resolve as NamedExpression and flatten back via .toAttribute. This supports both flat and nested row-id columns; flat-column behavior is unchanged.

Why are the changes needed?

To unblocks Delta Lake DSv2 connectors that identify rows by file-source metadata such as (_metadata.file_path, _metadata.row_index).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests are added to make sure it works for nested and flat cases.

Was this patch authored or co-authored using generative AI tooling?

Yes, generated by Claude.

xupefei added 2 commits May 18, 2026 14:10
…umns

DSv2 connectors that implement SupportsDelta currently must use a
top-level column for `rowId()`. If a connector returns a multi-segment
field reference (e.g. a nested struct field, or `_metadata.row_index`
on a file-source-backed table), analysis fails with a ClassCastException
because Spark calls `V2ExpressionUtils.resolveRefs[AttributeReference]`,
while nested references resolve to `Alias(GetStructField(...))`.

Widen `RewriteRowLevelCommand.resolveRowIdAttrs` and
`WriteDelta.rowIdAttrsResolved` to resolve as `NamedExpression` and
flatten back via `.toAttribute`. This supports both flat and nested
row-id columns; flat-column behavior is unchanged.

Tests added in `V2ExpressionUtilsSuite` cover the nested case, the
flat case, and demonstrate that the previous `[AttributeReference]`
cast would throw for nested references.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant