Describe the bug
The datafusion-spark mod and pmod functions return NaN for floating-point modulo by zero, while Apache Spark returns NULL. The integer case is handled correctly (returns NULL), but the float case falls through to IEEE 754 behavior.
To Reproduce
Run the new script added in #21508 for verifying Spark compatibility.
PySpark (correct behavior):
SELECT MOD(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE)); -- NULL
SELECT MOD(CAST(10.5 AS FLOAT), CAST(0.0 AS FLOAT)); -- NULL
SELECT MOD(CAST(10 AS INT), CAST(0 AS INT)); -- NULL
SELECT pmod(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE)); -- NULL
Spark returns NULL for all types consistently.
DataFusion-spark (incorrect behavior):
SELECT MOD(10.5::DOUBLE, 0.0::DOUBLE); -- NaN (should be NULL)
SELECT MOD(10.5::FLOAT, 0.0::FLOAT); -- NaN (should be NULL)
SELECT MOD(10::INT, 0::INT); -- NULL (correct)
SELECT pmod(10.5::DOUBLE, 0.0::DOUBLE); -- NaN (should be NULL)
Expected behavior
mod and pmod should return NULL for division by zero across all numeric types, matching Spark behavior.
Additional context
Root cause: In datafusion/spark/src/function/math/modulus.rs, the try_rem function (line 35) handles divide-by-zero by catching ArrowError::DivideByZero and nulling out zero divisors. However, Arrow's rem for float types does not throw DivideByZero — it returns NaN per IEEE 754. So the zero-check path is never triggered for floats, and NaN passes through as the result.
Fix: The function needs to explicitly check for zero divisors in float columns before or after calling rem, rather than relying on the DivideByZero error which only fires for integer types. One approach is to always null out zero-divisor positions before calling rem, regardless of type.
The .slt tests in math/mod.slt and math/pmod.slt also have incorrect expected values (NaN instead of NULL) for float div-by-zero cases and will need to be updated.
Describe the bug
The
datafusion-sparkmodandpmodfunctions returnNaNfor floating-point modulo by zero, while Apache Spark returnsNULL. The integer case is handled correctly (returns NULL), but the float case falls through to IEEE 754 behavior.To Reproduce
Run the new script added in #21508 for verifying Spark compatibility.
PySpark (correct behavior):
Spark returns NULL for all types consistently.
DataFusion-spark (incorrect behavior):
Expected behavior
modandpmodshould return NULL for division by zero across all numeric types, matching Spark behavior.Additional context
Root cause: In
datafusion/spark/src/function/math/modulus.rs, thetry_remfunction (line 35) handles divide-by-zero by catchingArrowError::DivideByZeroand nulling out zero divisors. However, Arrow'sremfor float types does not throwDivideByZero— it returnsNaNper IEEE 754. So the zero-check path is never triggered for floats, andNaNpasses through as the result.Fix: The function needs to explicitly check for zero divisors in float columns before or after calling
rem, rather than relying on theDivideByZeroerror which only fires for integer types. One approach is to always null out zero-divisor positions before callingrem, regardless of type.The
.slttests inmath/mod.sltandmath/pmod.sltalso have incorrect expected values (NaNinstead ofNULL) for float div-by-zero cases and will need to be updated.