Describe the bug
The Spark-compatible round() function gives different results from Apache Spark when the input is a floating-point type (FloatType/DoubleType) and the value's binary representation is slightly off from its decimal literal.
Spark's RoundBase rounds a double as BigDecimal(d).setScale(scale, HALF_UP), where BigDecimal(Double) is java.math.BigDecimal.valueOf(d) — i.e. it parses the shortest round-trip decimal string of the double (Double.toString). DataFusion's round_float instead does naive binary-float arithmetic, (value * 10^scale).round() / 10^scale, which rounds the already-imprecise binary value and diverges at the half-way point.
To Reproduce
SELECT round(1.255::double, 2::int);
-- Spark: 1.26
-- DataFusion: 1.25
SELECT round(1.005::double, 2::int);
-- Spark: 1.01
-- DataFusion: 1.0
The cause is that 1.255 and 1.005 are stored as binary doubles a hair below the decimal value (1.2549999999999999..., 1.00499999999999989...). Spark sees the shortest decimal string ("1.255", "1.005") and applies HALF_UP, so the tie rounds away from zero. DataFusion multiplies the raw binary value by 100, which stays below the half-way point, and rounds down.
Expected behavior
Match Spark: round via the shortest round-trip decimal representation with HALF_UP (ties away from zero), for both DoubleType and FloatType (Spark widens float to double first via f.toDouble).
Additional context
The existing doc comment on round_float already describes the intended BigDecimal / HALF_UP behaviour; the implementation simply doesn't match it. I have a fix and will open a PR referencing this issue.
datafusion/spark/src/function/math/round.rs
Describe the bug
The Spark-compatible
round()function gives different results from Apache Spark when the input is a floating-point type (FloatType/DoubleType) and the value's binary representation is slightly off from its decimal literal.Spark's
RoundBaserounds a double asBigDecimal(d).setScale(scale, HALF_UP), whereBigDecimal(Double)isjava.math.BigDecimal.valueOf(d)— i.e. it parses the shortest round-trip decimal string of the double (Double.toString). DataFusion'sround_floatinstead does naive binary-float arithmetic,(value * 10^scale).round() / 10^scale, which rounds the already-imprecise binary value and diverges at the half-way point.To Reproduce
The cause is that
1.255and1.005are stored as binary doubles a hair below the decimal value (1.2549999999999999...,1.00499999999999989...). Spark sees the shortest decimal string ("1.255","1.005") and applies HALF_UP, so the tie rounds away from zero. DataFusion multiplies the raw binary value by100, which stays below the half-way point, and rounds down.Expected behavior
Match Spark: round via the shortest round-trip decimal representation with HALF_UP (ties away from zero), for both
DoubleTypeandFloatType(Spark widens float to double first viaf.toDouble).Additional context
The existing doc comment on
round_floatalready describes the intendedBigDecimal/ HALF_UP behaviour; the implementation simply doesn't match it. I have a fix and will open a PR referencing this issue.datafusion/spark/src/function/math/round.rs