Skip to content

bug: datafusion-spark mod/pmod returns NaN instead of NULL for float division by zero #21514

@andygrove

Description

@andygrove

Describe the bug

The datafusion-spark mod and pmod functions return NaN for floating-point modulo by zero, while Apache Spark returns NULL. The integer case is handled correctly (returns NULL), but the float case falls through to IEEE 754 behavior.

To Reproduce

Run the new script added in #21508 for verifying Spark compatibility.

PySpark (correct behavior):

SELECT MOD(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE));  -- NULL
SELECT MOD(CAST(10.5 AS FLOAT), CAST(0.0 AS FLOAT));    -- NULL
SELECT MOD(CAST(10 AS INT), CAST(0 AS INT));             -- NULL
SELECT pmod(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE)); -- NULL

Spark returns NULL for all types consistently.

DataFusion-spark (incorrect behavior):

SELECT MOD(10.5::DOUBLE, 0.0::DOUBLE);  -- NaN (should be NULL)
SELECT MOD(10.5::FLOAT, 0.0::FLOAT);    -- NaN (should be NULL)
SELECT MOD(10::INT, 0::INT);             -- NULL (correct)
SELECT pmod(10.5::DOUBLE, 0.0::DOUBLE); -- NaN (should be NULL)

Expected behavior

mod and pmod should return NULL for division by zero across all numeric types, matching Spark behavior.

Additional context

Root cause: In datafusion/spark/src/function/math/modulus.rs, the try_rem function (line 35) handles divide-by-zero by catching ArrowError::DivideByZero and nulling out zero divisors. However, Arrow's rem for float types does not throw DivideByZero — it returns NaN per IEEE 754. So the zero-check path is never triggered for floats, and NaN passes through as the result.

Fix: The function needs to explicitly check for zero divisors in float columns before or after calling rem, rather than relying on the DivideByZero error which only fires for integer types. One approach is to always null out zero-divisor positions before calling rem, regardless of type.

The .slt tests in math/mod.slt and math/pmod.slt also have incorrect expected values (NaN instead of NULL) for float div-by-zero cases and will need to be updated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions