Skip to content

[SPARK-56945][SQL] Fix stale nullability in CTE references containing SQL UDFs#55986

Open
mikhailnik-db wants to merge 1 commit into
apache:masterfrom
mikhailnik-db:cte-sql-udf-nullability
Open

[SPARK-56945][SQL] Fix stale nullability in CTE references containing SQL UDFs#55986
mikhailnik-db wants to merge 1 commit into
apache:masterfrom
mikhailnik-db:cte-sql-udf-nullability

Conversation

@mikhailnik-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Skip the non-recursive CTERelationRef schema snapshot in ResolveWithCTE while the matching CTERelationDef still contains an unresolved SQLFunctionExpression. A subsequent fixed-point iteration retries the substitution once ResolveSQLFunctions has inlined the UDF body.

Why are the changes needed?

SQLFunctionExpression hard-codes nullable = true but is resolved as soon as its inputs resolve. CTERelationRef.output is a val snapshot of cteDef.output, so capturing it before the UDF inlines freezes nullable = true.

For nested UDF calls like wrap_int(non_null_one()), the outer placeholder survives one analyzer iteration (ResolveSQLFunctions skips UDFs whose inputs themselves contain a SQLFunctionExpression). ResolveWithCTE, which runs later in the same batch, snapshots the still-incorrect output, and the !ref.resolved gate prevents a fix-up on the next iteration.

Single-level UDF cases inline fully in iter 1 and avoid the bug. Recursive CTEs are unaffected: they already force withNullability(true) by design.

Does this PR introduce any user-facing change?

Yes. CTE columns wrapping nested non-nullable SQL UDFs now report nullable = false instead of nullable = true. Row-level results are unchanged.

Before:

CREATE FUNCTION non_null_one() RETURNS INT RETURN 1;
CREATE FUNCTION wrap_int(x INT) RETURNS INT RETURN x;
WITH cte AS (SELECT wrap_int(non_null_one()) AS x) SELECT * FROM cte;
-- x: int (nullable = true)

After:

-- x: int (nullable = false)

How was this patch tested?

New regression test in SQLFunctionSuite (SPARK-56945: CTE preserves non-nullable SQL UDF body in materialized schema). Fails on master with x: integer (nullable = true), passes with this PR. No SQL UDF golden file diffs.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic, Claude Code, Opus 4.7)

Comment thread sql/core/src/test/scala/org/apache/spark/sql/execution/SQLFunctionSuite.scala Outdated
Copy link
Copy Markdown
Contributor

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix itself lgtm, but left a few comments.

@mikhailnik-db mikhailnik-db force-pushed the cte-sql-udf-nullability branch 3 times, most recently from c7f70eb to 6bbdde6 Compare May 20, 2026 16:52
@mikhailnik-db mikhailnik-db force-pushed the cte-sql-udf-nullability branch from 6bbdde6 to a6ac515 Compare May 20, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants