feat: add native support for pmod expression by andygrove · Pull Request #4277 · apache/datafusion-comet

andygrove · 2026-05-09T21:15:05Z

Which issue does this PR close?

N/A - new expression support

Rationale for this change

Spark's pmod (positive modulo) expression currently falls back to Spark when running under Comet. Adding native support avoids the fallback overhead for this commonly used function.

What changes are included in this PR?

Implements pmod end-to-end through the Comet stack:

Proto: Added MathExpr pmod = 71 to the Expr oneof in expr.proto
Scala serde: Added CometPmod object in arithmetic.scala using the MathExpr / MathBase pattern, rejecting TRY eval mode
Scala registry: Registered classOf[Pmod] -> CometPmod in QueryPlanSerde.scala
Rust scalar function: Implemented spark_pmod in modulo_expr.rs — computes ((a % b) + b) % b to ensure positive results, with proper null/error handling for division by zero
Rust expression builder: Added create_pmod_expr with null-if-zero wrapping for non-ANSI mode and Decimal256 overflow handling (mirrors create_modulo_expr)
Rust expression registry: Added PmodBuilder and ExpressionType::Pmod variant
Tests: Comet SQL Tests for int, long, float, double, decimal, short, byte types with NULL handling and division-by-zero; ANSI mode error test
Docs: Marked pmod as supported in spark_expressions_support.md

The implement-comet-expression skill was used to scaffold this implementation.

How are these changes tested?

pmod.sql: Tests all supported numeric types (int, long, float, double, decimal, short, byte) with positive/negative dividends and divisors, NULL inputs, and division by zero in non-ANSI mode (returns NULL). Uses ConfigMatrix: parquet.enable.dictionary=false,true.
pmod_ansi.sql: Tests that division by zero raises an error in ANSI mode (spark.sql.ansi.enabled=true).

./mvnw test -Pspark-3.5 -Pscala-2.12 -Dsuites="org.apache.comet.CometSqlFileTestSuite pmod" -Dtest=none

Implement Spark's pmod (positive modulo) expression natively in Comet. This avoids falling back to Spark for pmod operations on all numeric types.

TRY mode behaves identically to LEGACY (returns NULL on division by zero), so there was no reason to reject it. Also adds boundary value, NaN/Infinity, negative zero, and high-precision decimal test cases from the audit.

…e enabled When spark.comet.exec.strictFloatingPoint=true and the input type is float/double, pmod now falls back to Spark. This avoids the negative zero incompatibility where Comet returns 0.0 but Spark returns -0.0.

mbutrovich

First round of feedback, thanks @andygrove!

mbutrovich · 2026-05-19T15:12:45Z

+            .map_err(|e| arrow::error::ArrowError::ComputeError(e.to_string()))?
+            .to_array_of_size(left.len())
+            .map_err(|e| arrow::error::ArrowError::ComputeError(e.to_string()))?;
+        let result = rem(left, right)?;


Could the negative-zero incompatibility be eliminated by branching with zip instead of doing an unconditional add/rem?

The current shape

let result = rem(left, right)?; let neg = arrow::compute::kernels::cmp::lt(&result, &zero)?; let plus = arrow::compute::kernels::zip::zip(&neg, right, &zero)?; let result = arrow::compute::kernels::numeric::add(&plus, &result)?; rem(&result, right)

normalizes -0.0 through add(+0.0, -0.0) = +0.0, which is the source of the documented strict-floating-point caveat. Spark's reference implementation is if (r < 0) (r + n) % n else r, i.e. the non-negative branch returns r untouched.

A direct vectorized translation:

let result = rem(left, right)?; let neg = arrow::compute::kernels::cmp::lt(&result, &zero)?; let adjusted = rem(&arrow::compute::kernels::numeric::add(&result, right)?, right)?; arrow::compute::kernels::zip::zip(&neg, &adjusted, &result)

That preserves the original r for r >= 0, including -0.0, and matches Spark exactly. Quick sanity check on the interesting inputs:

input r neg result

pmod(-0.0, 3.0) -0.0 false -0.0 ✓

pmod(NaN, 3.0) NaN false (NaN cmp) NaN ✓

pmod(-10, 3) -1 true 2 ✓

pmod(-10, -3) -1 true -1 ✓

If this works, CometPmod can drop getIncompatibleReasons / getSupportLevel and just be Compatible().

mbutrovich · 2026-05-19T15:13:42Z

+    let left_data_type = lhs.data_type();
+    let right_data_type = rhs.data_type();
+
+    if left_data_type.is_nested() {


The nested-type branch dispatches to apply_cmp_for_nested(Operator::Modulo, ...), but two things look off:

Pmod is numeric-only in Spark, and CometPmod.supportedDataType already rejects non-numeric types. This branch should be unreachable.

Even if it were reached, it would compute Modulo, not Pmod.

Suggest dropping the whole if left_data_type.is_nested() block, or replacing the body with internal_err!("spark_pmod does not support nested types") to make the invariant explicit.

mbutrovich · 2026-05-19T15:14:29Z

+    }
+  }
+
+  override def convert(


The PR description says "rejecting TRY eval mode", but CometPmod.convert doesn't have the expr.evalMode == EvalMode.TRY guard that CometRemainder has at line 270.

There's no try_pmod in Spark's function registry, so the gap is unlikely to be reachable today. But matching CometRemainder's shape keeps the serde defensive in case Pmod ever gets constructed with a TRY context, and aligns the code with the PR description.

- Use a zip-based `if (r < 0) (r + n) % n else r` so the non-negative branch (including -0.0) is returned untouched. Mask the right-hand side to zero on non-negative rows so the `add` does not overflow on values whose adjusted branch is discarded by the zip (e.g. pmod(2L, Long.MaxValue)). - Arrow's `cmp::lt` uses total ordering and reports `lt(-0.0, 0.0)` as true, so compare against -0.0 for floats to recover IEEE semantics. - Drop the unreachable nested-type branch; Pmod is numeric-only. - Add the EvalMode.TRY guard that CometRemainder has. - Drop the strict-FP incompatibility now that -0.0 round-trips, and unignore the negative-zero SQL test.

andygrove added 3 commits May 9, 2026 15:14

feat: add native support for pmod expression

2ed72c6

Implement Spark's pmod (positive modulo) expression natively in Comet. This avoids falling back to Spark for pmod operations on all numeric types.

feat: support TRY eval mode for pmod and add edge case tests

c90162d

TRY mode behaves identically to LEGACY (returns NULL on division by zero), so there was no reason to reject it. Also adds boundary value, NaN/Infinity, negative zero, and high-precision decimal test cases from the audit.

andygrove force-pushed the feat/pmod branch from 7735176 to 5a31b2c Compare May 9, 2026 22:32

Merge remote-tracking branch 'apache/main' into feat/pmod

9c4c367

mbutrovich self-requested a review May 19, 2026 14:55

mbutrovich requested changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add native support for pmod expression#4277

feat: add native support for pmod expression#4277
andygrove wants to merge 5 commits into
apache:mainfrom
andygrove:feat/pmod

andygrove commented May 9, 2026

Uh oh!

mbutrovich left a comment

Uh oh!

mbutrovich May 19, 2026

Uh oh!

mbutrovich May 19, 2026

Uh oh!

mbutrovich May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

input	`r`	`neg`	result
`pmod(-0.0, 3.0)`	`-0.0`	false	`-0.0` ✓
`pmod(NaN, 3.0)`	`NaN`	false (NaN cmp)	`NaN` ✓
`pmod(-10, 3)`	`-1`	true	`2` ✓
`pmod(-10, -3)`	`-1`	true	`-1` ✓

Conversation

andygrove commented May 9, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich May 19, 2026

Choose a reason for hiding this comment

Uh oh!

mbutrovich May 19, 2026

Choose a reason for hiding this comment

Uh oh!

mbutrovich May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants