Skip to content

[SPARK-57641] Add variant_insert expression#56703

Open
bojana-db wants to merge 7 commits into
apache:masterfrom
bojana-db:variant-insert
Open

[SPARK-57641] Add variant_insert expression#56703
bojana-db wants to merge 7 commits into
apache:masterfrom
bojana-db:variant-insert

Conversation

@bojana-db

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds the SQL function variant_insert(v, path, val), which inserts a value into a Variant value at a single JSONPath location.

Details:

  • Object path (e.g. $.b): adds a new field; errors with VARIANT_DUPLICATE_KEY
    if the key already exists;
  • Array path (e.g. $[N]): inserts at index N, shifting later elements right; an
    index past the end pads the gap with variant nulls;
  • Missing intermediate keys/indices along the path are created;
  • Any NULL argument returns NULL;
  • The value may be any expression castable to variant;
  • A new error condition VARIANT_PATH_TYPE_MISMATCH is raised when a path segment
    is applied to an incompatible value (e.g. an object key on an array); the root
    path $ is rejected with INVALID_VARIANT_PATH, and results exceeding the size
    limit raise VARIANT_SIZE_LIMIT.

Why are the changes needed?

There is no way to add a field or element to a variant; today the only option is to convert it to another datatype (e.g. map), mutate, and convert back.

Does this PR introduce any user-facing change?

Yes, a new SQL function (and Scala/Python functions API) plus a new error condition VARIANT_PATH_TYPE_MISMATCH.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code with Claude Opus 4.8

@bojana-db bojana-db changed the title init [SPARK-57417] Add variant_insert expression Jun 23, 2026
@bojana-db bojana-db changed the title [SPARK-57417] Add variant_insert expression [SPARK-57641] Add variant_insert expression Jun 23, 2026

@harshmotw-db harshmotw-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature! I have gone through the scala implementation and it looks good to me from an implementation PoV. So far, my only point of contention is whether we want to implicitly cast structs/maps to variant using a to_variant_object transformation which is implicitly lossy. Tagging @srielau to think about this.
Alternatives I can think of are:

  1. Force users to pass variant as the input argument. They could cast it themselves
  2. Allow users to pass any data that doesn't contain structs/maps and if they want to use the to_variant_object transformation, they could do that manually and pass variant as the inserted argument instead.

I will go over the Python code and the tests tomorrow

@srielau srielau left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants