Skip to content

feat: wire factorial and update wire skill#4349

Merged
andygrove merged 6 commits into
apache:mainfrom
comphead:factorial
May 20, 2026
Merged

feat: wire factorial and update wire skill#4349
andygrove merged 6 commits into
apache:mainfrom
comphead:factorial

Conversation

@comphead
Copy link
Copy Markdown
Contributor

@comphead comphead commented May 15, 2026

Which issue does this PR close?

Part of #4150.

  • Wire factorial math function from datafusion-spark crate
  • Adding factorial spark SQL tests
  • Minor issue addressed in the Claude skill, like find alternative spark code base, and resolve conflicts between datafusion built in functions and datafusion-spark ones

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@comphead
Copy link
Copy Markdown
Contributor Author

@andygrove looks like OOM still exists for 4.1.1 CI 🤔

### 2. Find the upstream function

Read the pinned `datafusion-spark` version from `native/Cargo.toml` rather than picking the first cached copy — `head -1` can land on an older release that lacks the function:
**Prefer a local DataFusion clone if one is available** — check `CLAUDE.md`, project memory, and the user's `~/dev` tree for an existing checkout (e.g. `~/dev/.../rust/datafusion`). The local clone is typically the latest source and avoids cargo-registry version mismatches. If no local clone is found, fall back to the cached registry crate.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly specific to your setup unless we're saying in the contributor guide that people should be working in a ~/dev folder. I suspect users should be telling Claude on their own where their working directories are.

Also, is this general advice even true? Comet lags behind, and I could easily see an agent trying to wire something up that exists in upstream but isn't on Comet's DF version yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine it tries to identify current DF version from Cargo.toml and browse supported functions from there. However we need to specify this requirement explicitly. I can do it in follow up PR when wiring more functions

@comphead comphead requested a review from mbutrovich May 19, 2026 19:40
(cast(-1 as tinyint), cast(-1 as smallint), -1, cast(-1 as bigint), cast(-1.0 as float), -1.0, cast(-1 as decimal(38, 0)), cast(-1.50 as decimal(10, 2))),
(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL),
(cast(127 as tinyint), cast(32767 as smallint), 2147483647, 9223372036854775807, cast('Infinity' as float), cast('Infinity' as double), cast(99999999999999999999999999999999999999 as decimal(38, 0)), cast(99999999.99 as decimal(10, 2))),
(cast(-128 as tinyint),cast(-32768 as smallint),-2147483648,-9223372036854775808, cast('-Infinity' as float), cast('-Infinity' as double), cast(-99999999999999999999999999999999999999 as decimal(38, 0)),cast(-99999999.99 as decimal(10, 2))),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark wraps Factorial(child) in an implicit cast to IntegerType. In Spark 4.x ANSI mode, casting Long.MaxValue or Infinity to int would normally throw. CI is green, so either the implicit-cast path here is not tripping ANSI, or Comet is matching whatever error Spark throws.

Can you confirm the bigint/float/double rows are actually exercising the native factorial path in the 4.x ANSI runs and not silently falling back? If it is not obvious, an explicit expect_error query covering one overflow case on 4.x would document the intended behavior.

Copy link
Copy Markdown
Contributor Author

@comphead comphead May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats actually 2 good points here:

  • we need to include ansi into the test configuration matrix, for all tests in Comet to see how it works with ANSI
  • Spark 4 should be failing with ansi true by default, checking why it didn't

Copy link
Copy Markdown
Contributor Author

@comphead comphead May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansi flag fix for sql tests - #4368

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed the skill, because the factorial accepts only INT inputs, and generating tests for FPs were not expected

@comphead comphead requested a review from mbutrovich May 20, 2026 00:28
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comphead!

-- specific language governing permissions and limitations
-- under the License.

-- ConfigMatrix: parquet.enable.dictionary=false,true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all values in the table are distinct so there will not be any dictionary-encoding. This matrix is running the test twice for no good reason.

Suggested change
-- ConfigMatrix: parquet.enable.dictionary=false,true

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. Thanks @comphead

@andygrove andygrove merged commit b993c0f into apache:main May 20, 2026
154 of 155 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants