Skip to content

[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs#55915

Open
srielau wants to merge 1 commit into
apache:masterfrom
srielau:SPARK-56883-describe-sql-udf
Open

[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs#55915
srielau wants to merge 1 commit into
apache:masterfrom
srielau:SPARK-56883-describe-sql-udf

Conversation

@srielau
Copy link
Copy Markdown
Contributor

@srielau srielau commented May 15, 2026

What changes were proposed in this pull request?

Renders a structured DESCRIBE FUNCTION [EXTENDED] output for SQL user-defined functions (temporary and persistent) in place of the generic Function / Class / Usage:<json blob> dump that DescribeFunctionCommand produces today for any function whose ExpressionInfo.className != null.

For SQL UDFs the output becomes:

  • Function: qualified name
  • Type: SCALAR or TABLE
  • Input: parameter list (name + SQL type, column-aligned; DEFAULT <expr> and 'comment' annotations are added in EXTENDED mode)
  • Returns: scalar return type, or the table return columns (column comments and defaults are added in EXTENDED mode)
  • EXTENDED only: Comment, Collation, Deterministic, Data Access (CONTAINS SQL / READS SQL DATA), Configs, Owner, Create Time, Body, and SQL Path.

SQL Path: is emitted only when both spark.sql.path.enabled = true and a frozen path was persisted on the function at CREATE FUNCTION time (SPARK-56639 / SPARK-56520). The path is read from the function's function.resolutionPath property and rendered through SqlPathFormat.formatForDisplay, producing the same `catalog`.`namespace` format used elsewhere in DESCRIBE output. This shows the resolution path that the function will use during analysis — the creator's PATH frozen at CREATE time, not the invoker's current PATH.

Behavior for builtin functions and non-SQL UDFs is unchanged.

Class hierarchy / dispatch:

  • SQLFunction (catalyst): adds the SCALAR / TABLE constants and a new fromExpressionInfo(info, parser) constructor that reconstructs a SQLFunction from the JSON usage blob produced by toExpressionInfo. This is the same path used by both temp UDFs (which are not in the catalog) and persistent UDFs.
  • DescribeFunctionCommand (sql/core): when SQLFunction.isSQLFunction(info.getClassName) is true, dispatches to a new describeSQLFunction(info, parser) helper that emits the column-aligned key/value rows shown above. The frozen SQL PATH is rendered inline through SqlPathFormat; the temporary DescribeFunctionCommandUtils helper introduced for that purpose by SPARK-56639 is removed (its single responsibility is now absorbed by describeSQLFunction).
  • SessionCatalog.registerFunction: when a persistent SQL UDF is invoked for the first time, the function registry caches it. Previously the cached ExpressionInfo was always built via makeExprInfoForHiveFunction, which sets usage = null. That worked for the pre-existing DESCRIBE FUNCTION codepath (which doesn't read usage), but breaks the new describeSQLFunction path: after a SQL UDF has been invoked once, DESCRIBE FUNCTION reads back the cached info and SQLFunction.fromExpressionInfo cannot parse null. registerFunction now branches on funcDefinition.isUserDefinedFunction and builds the structured ExpressionInfo via UserDefinedFunction.fromCatalogFunction(funcDefinition, parser).toExpressionInfo for SQL UDFs (matching the lookup-side build in lookupPersistentFunction), so the cached info has the right usage blob for DESCRIBE.

Why are the changes needed?

DESCRIBE FUNCTION is intended to give users a human-readable description of a routine, analogous to DESCRIBE TABLE for tables. For SQL UDFs the current output instead exposes the internal serialization format:

> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE","sqlFunction.returnType":"DOUBLE","sqlFunction.expression":"width * height","sqlFunction.isTableFunc":"false",...}
 Extended Usage:

That JSON blob is not part of any public surface, and the literal string sqlFunction. for Class: is meaningless to users. All of the structured metadata we need — signature, return type, body, characteristics, frozen SQL PATH — is already serialized in ExpressionInfo; this PR just formats it.

Does this PR introduce any user-facing change?

Yes — the rows returned by DESCRIBE FUNCTION [EXTENDED] <sql_udf> change.

Before:

> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE", ...}
 Extended Usage:

After (simple case):

> DESCRIBE FUNCTION EXTENDED area;
 Function:      default.area
 Type:          SCALAR
 Input:         width  DOUBLE 'width'
                height DOUBLE 'height'
 Returns:       DOUBLE
 Comment:       compute area
 Deterministic: true
 Data Access:   CONTAINS SQL
 Owner:         <owner>
 Create Time:   <timestamp>
 Body:          width * height

After (function created under spark.sql.path.enabled = true with a non-default PATH at CREATE time):

> SET spark.sql.path.enabled = true;
> SET PATH = spark_catalog.path_func_db_a, system.builtin;
> CREATE FUNCTION frozen_fn() RETURNS INT RETURN (SELECT MAX(id) FROM frozen_t);
> SET PATH = spark_catalog.path_func_db_b, system.builtin;
> DESCRIBE FUNCTION EXTENDED default.frozen_fn;
 Function:      default.frozen_fn
 Type:          SCALAR
 Input:         ()
 Returns:       INT
 ...
 Body:          (SELECT MAX(id) FROM frozen_t)
 SQL Path:      `spark_catalog`.`path_func_db_a`, `system`.`builtin`

SQL Path reflects the creator's frozen PATH, not the session's current PATH at describe time. Output for builtin functions, Hive UDFs, and other non-SQL UDFs is unchanged.

How was this patch tested?

Added four unit tests to SQLFunctionSuite (sql/core):

  • describe SQL scalar functions — temporary and persistent scalar UDFs with comments, defaults, and EXTENDED mode. Asserts Function, Type, Input (column-aligned, with DEFAULT and 'comment' in extended mode), Returns, Deterministic, Data Access, Comment, Create Time, Body.
  • describe SQL table functions — table UDFs with explicit return columns; asserts Type: TABLE, Returns columns, and the EXTENDED-only fields.
  • describe SQL functions with derived routine characteristics — checks that Deterministic and Data Access reflect derived values for functions that read tables / call non-deterministic builtins, and that user-supplied characteristics are preserved.
  • The existing SPARK-56639: SQL function uses frozen SQL path test is extended: after switching PATH to a different namespace it invokes default.frozen_fn (populating the function-registry cache) and then runs DESCRIBE FUNCTION EXTENDED default.frozen_fn, asserting the SQL Path: row shows the creator's frozen path (`spark_catalog`.`path_func_db_a`, `system`.`builtin`) and does not mention the invoker's current path namespace. This extension also exercises the SessionCatalog.registerFunction fix above: prior to the fix, the DESCRIBE after the invocation hit CORRUPTED_CATALOG_FUNCTION because the cached ExpressionInfo had usage = null.

Each describe test uses checkKeywordsExist against DESCRIBE FUNCTION [EXTENDED] <name> output.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (claude-opus-4-7)

@srielau srielau force-pushed the SPARK-56883-describe-sql-udf branch from c690ccf to 4de3ec1 Compare May 16, 2026 00:56
Renders structured DESCRIBE output for SQL user-defined functions instead
of the generic Class/Usage dump: Function/Type/Input/Returns, and in
EXTENDED mode Comment/Collation/Deterministic/Data Access/Configs/Owner/
Create Time/Body/SQL Path. Ports the formatter from the Databricks
runtime.

- SQLFunction: add SCALAR/TABLE constants and fromExpressionInfo for
  reconstructing the function from its ExpressionInfo usage blob (covers
  both temp and persistent SQL UDFs).
- DescribeFunctionCommand: dispatch to describeSQLFunction when the
  className matches SQLFunction.isSQLFunction; inline the SQL PATH
  display via SqlPathFormat (replaces DescribeFunctionCommandUtils).
- SQLFunctionSuite: port describe tests for scalar/table SQL UDFs and
  derived routine characteristics.
@srielau srielau force-pushed the SPARK-56883-describe-sql-udf branch from 4de3ec1 to d8fb332 Compare May 16, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant