Skip to content

feat: add native support for pmod expression#4277

Open
andygrove wants to merge 5 commits into
apache:mainfrom
andygrove:feat/pmod
Open

feat: add native support for pmod expression#4277
andygrove wants to merge 5 commits into
apache:mainfrom
andygrove:feat/pmod

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A - new expression support

Rationale for this change

Spark's pmod (positive modulo) expression currently falls back to Spark when running under Comet. Adding native support avoids the fallback overhead for this commonly used function.

What changes are included in this PR?

Implements pmod end-to-end through the Comet stack:

  • Proto: Added MathExpr pmod = 71 to the Expr oneof in expr.proto
  • Scala serde: Added CometPmod object in arithmetic.scala using the MathExpr / MathBase pattern, rejecting TRY eval mode
  • Scala registry: Registered classOf[Pmod] -> CometPmod in QueryPlanSerde.scala
  • Rust scalar function: Implemented spark_pmod in modulo_expr.rs — computes ((a % b) + b) % b to ensure positive results, with proper null/error handling for division by zero
  • Rust expression builder: Added create_pmod_expr with null-if-zero wrapping for non-ANSI mode and Decimal256 overflow handling (mirrors create_modulo_expr)
  • Rust expression registry: Added PmodBuilder and ExpressionType::Pmod variant
  • Tests: Comet SQL Tests for int, long, float, double, decimal, short, byte types with NULL handling and division-by-zero; ANSI mode error test
  • Docs: Marked pmod as supported in spark_expressions_support.md

The implement-comet-expression skill was used to scaffold this implementation.

How are these changes tested?

  • pmod.sql: Tests all supported numeric types (int, long, float, double, decimal, short, byte) with positive/negative dividends and divisors, NULL inputs, and division by zero in non-ANSI mode (returns NULL). Uses ConfigMatrix: parquet.enable.dictionary=false,true.
  • pmod_ansi.sql: Tests that division by zero raises an error in ANSI mode (spark.sql.ansi.enabled=true).
./mvnw test -Pspark-3.5 -Pscala-2.12 -Dsuites="org.apache.comet.CometSqlFileTestSuite pmod" -Dtest=none

andygrove added 3 commits May 9, 2026 15:14
Implement Spark's pmod (positive modulo) expression natively in Comet.
This avoids falling back to Spark for pmod operations on all numeric types.
TRY mode behaves identically to LEGACY (returns NULL on division by
zero), so there was no reason to reject it. Also adds boundary value,
NaN/Infinity, negative zero, and high-precision decimal test cases
from the audit.
…e enabled

When spark.comet.exec.strictFloatingPoint=true and the input type is
float/double, pmod now falls back to Spark. This avoids the negative
zero incompatibility where Comet returns 0.0 but Spark returns -0.0.
@mbutrovich mbutrovich self-requested a review May 19, 2026 14:55
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of feedback, thanks @andygrove!

.map_err(|e| arrow::error::ArrowError::ComputeError(e.to_string()))?
.to_array_of_size(left.len())
.map_err(|e| arrow::error::ArrowError::ComputeError(e.to_string()))?;
let result = rem(left, right)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the negative-zero incompatibility be eliminated by branching with zip instead of doing an unconditional add/rem?

The current shape

let result = rem(left, right)?;
let neg = arrow::compute::kernels::cmp::lt(&result, &zero)?;
let plus = arrow::compute::kernels::zip::zip(&neg, right, &zero)?;
let result = arrow::compute::kernels::numeric::add(&plus, &result)?;
rem(&result, right)

normalizes -0.0 through add(+0.0, -0.0) = +0.0, which is the source of the documented strict-floating-point caveat. Spark's reference implementation is if (r < 0) (r + n) % n else r, i.e. the non-negative branch returns r untouched.

A direct vectorized translation:

let result = rem(left, right)?;
let neg = arrow::compute::kernels::cmp::lt(&result, &zero)?;
let adjusted = rem(&arrow::compute::kernels::numeric::add(&result, right)?, right)?;
arrow::compute::kernels::zip::zip(&neg, &adjusted, &result)

That preserves the original r for r >= 0, including -0.0, and matches Spark exactly. Quick sanity check on the interesting inputs:

input r neg result
pmod(-0.0, 3.0) -0.0 false -0.0
pmod(NaN, 3.0) NaN false (NaN cmp) NaN
pmod(-10, 3) -1 true 2
pmod(-10, -3) -1 true -1

If this works, CometPmod can drop getIncompatibleReasons / getSupportLevel and just be Compatible().

let left_data_type = lhs.data_type();
let right_data_type = rhs.data_type();

if left_data_type.is_nested() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested-type branch dispatches to apply_cmp_for_nested(Operator::Modulo, ...), but two things look off:

  1. Pmod is numeric-only in Spark, and CometPmod.supportedDataType already rejects non-numeric types. This branch should be unreachable.
  2. Even if it were reached, it would compute Modulo, not Pmod.

Suggest dropping the whole if left_data_type.is_nested() block, or replacing the body with internal_err!("spark_pmod does not support nested types") to make the invariant explicit.

}
}

override def convert(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says "rejecting TRY eval mode", but CometPmod.convert doesn't have the expr.evalMode == EvalMode.TRY guard that CometRemainder has at line 270.

There's no try_pmod in Spark's function registry, so the gap is unlikely to be reachable today. But matching CometRemainder's shape keeps the serde defensive in case Pmod ever gets constructed with a TRY context, and aligns the code with the PR description.

- Use a zip-based `if (r < 0) (r + n) % n else r` so the non-negative
  branch (including -0.0) is returned untouched. Mask the right-hand
  side to zero on non-negative rows so the `add` does not overflow on
  values whose adjusted branch is discarded by the zip (e.g.
  pmod(2L, Long.MaxValue)).
- Arrow's `cmp::lt` uses total ordering and reports `lt(-0.0, 0.0)` as
  true, so compare against -0.0 for floats to recover IEEE semantics.
- Drop the unreachable nested-type branch; Pmod is numeric-only.
- Add the EvalMode.TRY guard that CometRemainder has.
- Drop the strict-FP incompatibility now that -0.0 round-trips, and
  unignore the negative-zero SQL test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants