Skip to content

fix: Fix Spark slice function Null type to GenericListArray casting issue#20469

Open
erenavsarogullari wants to merge 3 commits intoapache:mainfrom
erenavsarogullari:df_spark_slice_func_fix
Open

fix: Fix Spark slice function Null type to GenericListArray casting issue#20469
erenavsarogullari wants to merge 3 commits intoapache:mainfrom
erenavsarogullari:df_spark_slice_func_fix

Conversation

@erenavsarogullari
Copy link
Member

Which issue does this PR close?

Rationale for this change

Currently, Spark slice function accepts Null Arrays and return Null for this particular queries. DataFusion-Spark slice function also needs to return NULL when Null Array is set.
Spark Behavior (tested with latest Spark master):

> SELECT slice(NULL, 1, 2);
+-----------------+
|slice(NULL, 1, 2)|
+-----------------+
|             null|
+-----------------+

DF Behaviour:
Current:

query error
SELECT slice(NULL, 1, 2);
----
DataFusion error: Internal error: could not cast array of type Null to arrow_array::array::list_array::GenericListArray<i32>.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues

New:

query ?
SELECT slice(NULL, 1, 2);
----
NULL

What changes are included in this PR?

Explained under first section.

Are these changes tested?

Added new UT cases for both slice.rs and slice.slt.

Are there any user-facing changes?

Yes, currently, slice function returns error message for Null Array inputs, however, expected behavior is to be returned NULL so end-user will get expected result instead of error message.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Feb 21, 2026
@Jefffrey
Copy link
Contributor

Does Spark return null (void) or an array of null (void)?

I tested on PySpark 4.1.1

>>> spark.sql("select slice(NULL, 1, 2) as a").printSchema()
root
 |-- a: array (nullable = true)
 |    |-- element: void (containsNull = true)
  • Seems to suggest the latter?

@erenavsarogullari
Copy link
Member Author

Spark returns Array as (nullable = true / false) and there are 2 representations as follows in terms of input array so it returns input array for both cases as either NULL or [NULL]:
Case1: input array is null (array (nullable = true)):

+-----------------+
|slice(NULL, 1, 2)|
+-----------------+
|             NULL|
+-----------------+

root
 |-- slice(NULL, 1, 2): array (nullable = true)
 |    |-- element: void (containsNull = true)

Case2: input array has null element (array (nullable = false)):

+------------------------+
|slice(array(NULL), 1, 2)|
+------------------------+
|                  [NULL]|
+------------------------+

root
 |-- slice(array(NULL), 1, 2): array (nullable = false)
 |    |-- element: void (containsNull = true)

@Jefffrey
Copy link
Contributor

This seems to suggest return type should be list of nulls instead of just null

erenavsarogullari and others added 2 commits February 28, 2026 10:57
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
@erenavsarogullari
Copy link
Member Author

This seems to suggest return type should be list of nulls instead of just null

Yes, latest fix aims to address this by returning NullArray instead of Scalar Null value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Spark slice function Null type to GenericListArray casting issue

3 participants