feat: add with_metadata scalar UDF to attach Arrow field metadata#21509
Open
adriangb wants to merge 1 commit intoapache:mainfrom
Open
feat: add with_metadata scalar UDF to attach Arrow field metadata#21509adriangb wants to merge 1 commit intoapache:mainfrom
with_metadata scalar UDF to attach Arrow field metadata#21509adriangb wants to merge 1 commit intoapache:mainfrom
Conversation
Introduces `with_metadata(expr, 'k1', 'v1'[, 'k2', 'v2', ...])`, the inverse of `arrow_metadata`. Values pass through unchanged; the returned FieldRef has the supplied key/value pairs merged into the input field's metadata (new keys overwrite on collision). Input field name, data type, and nullability are preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kosiew
approved these changes
Apr 10, 2026
| )) | ||
| })?; | ||
|
|
||
| let value = args.scalar_arguments[value_idx] |
Contributor
There was a problem hiding this comment.
I noticed the public contract says both keys and values must be non empty constant strings, but right now only the keys are enforced.
Values can still be empty strings and get stored without any issue. Would you prefer to add the same non empty check for values here, or relax the docs and tests so the behavior is consistent?
Either way works, it would just be nice for callers to have one clear rule.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
DataFusion already exposes
arrow_metadata(expr[, key])for reading Arrow field metadata, but has no way to attach metadata to a column from SQL or theExprDSL. Arrow field metadata is useful for propagating annotations (units, semantic types, provenance, downstream hints) through a query plan without materializing an extra value column.This PR adds
with_metadata, the symmetric counterpart toarrow_metadata.What changes are included in this PR?
A new core scalar UDF
with_metadata(expr, 'k1', 'v1'[, 'k2', 'v2', ...]):FieldRefwhose metadata is the input field's metadata merged with the supplied key/value pairs; new keys overwrite on collision. Input field name, data type, and nullability are preserved, sowith_metadata(col, ...)behaves as a transparent annotation.named_struct. Chosen over a list-of-pairs form because SQL lacks a tuple literal and programmatic callers can simply splat an alternatingVec<Expr>of literals.return_field_from_args. Requires an odd arg count ≥ 3; each key must be a non-empty constant string; each value must be a constant string.Example usage:
Files touched:
datafusion/functions/src/core/with_metadata.rs(new) — UDF impl + unit testsdatafusion/functions/src/core/mod.rs— registration infunctions(),make_udf_function!, andexpr_fndatafusion/sqllogictest/test_files/metadata.slt— SQL-level coverage (merge, overwrite, nesting, pass-through, error cases)docs/source/user-guide/sql/scalar_functions.md— regenerated viadev/update_function_docs.shAre these changes tested?
Yes:
datafusion/functions/src/core/with_metadata.rs) covering single-key attach, merge-with-overwrite on collision, multi-pair attach, even-arity rejection, too-few-args rejection, and non-literal-key rejection.metadata.slt) covering attach/read roundtrip, merging with pre-existing field metadata, collision overwrite, nestedwith_metadata(with_metadata(...)), value pass-through, and planning-time errors (odd arity, missing args, non-literal key, empty key).cargo fmt --allclean;cargo clippy -p datafusion-functions --all-targets --all-features -- -D warningsclean (themutable_key_typeerror surfaced by--all-targets --all-featureson the full workspace is pre-existing onmainand unrelated to this PR).Are there any user-facing changes?
Yes — a new built-in scalar function
with_metadatais now available in SQL and viadatafusion_functions::expr_fn::with_metadata. Generated docs are updated accordingly. No existing behavior changes.