feat: wire factorial and update wire skill#4349
Conversation
|
@andygrove looks like OOM still exists for 4.1.1 CI 🤔 |
| ### 2. Find the upstream function | ||
|
|
||
| Read the pinned `datafusion-spark` version from `native/Cargo.toml` rather than picking the first cached copy — `head -1` can land on an older release that lacks the function: | ||
| **Prefer a local DataFusion clone if one is available** — check `CLAUDE.md`, project memory, and the user's `~/dev` tree for an existing checkout (e.g. `~/dev/.../rust/datafusion`). The local clone is typically the latest source and avoids cargo-registry version mismatches. If no local clone is found, fall back to the cached registry crate. |
There was a problem hiding this comment.
This seems overly specific to your setup unless we're saying in the contributor guide that people should be working in a ~/dev folder. I suspect users should be telling Claude on their own where their working directories are.
Also, is this general advice even true? Comet lags behind, and I could easily see an agent trying to wire something up that exists in upstream but isn't on Comet's DF version yet.
There was a problem hiding this comment.
On my machine it tries to identify current DF version from Cargo.toml and browse supported functions from there. However we need to specify this requirement explicitly. I can do it in follow up PR when wiring more functions
| (cast(-1 as tinyint), cast(-1 as smallint), -1, cast(-1 as bigint), cast(-1.0 as float), -1.0, cast(-1 as decimal(38, 0)), cast(-1.50 as decimal(10, 2))), | ||
| (NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), | ||
| (cast(127 as tinyint), cast(32767 as smallint), 2147483647, 9223372036854775807, cast('Infinity' as float), cast('Infinity' as double), cast(99999999999999999999999999999999999999 as decimal(38, 0)), cast(99999999.99 as decimal(10, 2))), | ||
| (cast(-128 as tinyint),cast(-32768 as smallint),-2147483648,-9223372036854775808, cast('-Infinity' as float), cast('-Infinity' as double), cast(-99999999999999999999999999999999999999 as decimal(38, 0)),cast(-99999999.99 as decimal(10, 2))), |
There was a problem hiding this comment.
Spark wraps Factorial(child) in an implicit cast to IntegerType. In Spark 4.x ANSI mode, casting Long.MaxValue or Infinity to int would normally throw. CI is green, so either the implicit-cast path here is not tripping ANSI, or Comet is matching whatever error Spark throws.
Can you confirm the bigint/float/double rows are actually exercising the native factorial path in the 4.x ANSI runs and not silently falling back? If it is not obvious, an explicit expect_error query covering one overflow case on 4.x would document the intended behavior.
There was a problem hiding this comment.
Thats actually 2 good points here:
- we need to include ansi into the test configuration matrix, for all tests in Comet to see how it works with ANSI
- Spark 4 should be failing with ansi true by default, checking why it didn't
There was a problem hiding this comment.
Also fixed the skill, because the factorial accepts only INT inputs, and generating tests for FPs were not expected
| -- specific language governing permissions and limitations | ||
| -- under the License. | ||
|
|
||
| -- ConfigMatrix: parquet.enable.dictionary=false,true |
There was a problem hiding this comment.
nit: all values in the table are distinct so there will not be any dictionary-encoding. This matrix is running the test twice for no good reason.
| -- ConfigMatrix: parquet.enable.dictionary=false,true |
Which issue does this PR close?
Part of #4150.
factorialmath function fromdatafusion-sparkcratefactorialspark SQL testsdatafusion-sparkonesRationale for this change
What changes are included in this PR?
How are these changes tested?