Missingness splits with -Infinity thresholds require special handling#124
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (2)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe module version is bumped to ChangesRelease metadata and changelog
Model generation test coverage
Sequence Diagram(s)None. Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces support for handling non-finite float values (such as -Infinity, Infinity, and NaN) in XGBoost JSON models, specifically addressing the -Infinity threshold on numeric splits (representing missingness splits). It adds a preprocessing step (sanitizeNonFiniteNumbers) to rewrite these JSON-incompatible literals into quoted strings so they can be parsed by Go's standard JSON decoder via a custom xgbFloat type. Additionally, the code generator is updated to collapse decision nodes with a -Infinity threshold into clean missingness-only branches, and corresponding unit tests and test data are added to verify this behavior. There are no review comments, so no further feedback is provided.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
c17f095 to
cf449e9
Compare
| @@ -1,4 +1,4 @@ | |||
| # @generated - this file is auto-generated by `mise lock` https://mise.en.dev/dev-tools/mise-lock.html | |||
| # @generated - this file is auto-generated by `mise lock` https://mise.jdx.dev/dev-tools/mise-lock.html | |||
There was a problem hiding this comment.
Is there an update available for your mise? I think we're swapping back and forth with this header. If you update it I think it should revert back to the en.dev domain.
There was a problem hiding this comment.
Apologies. My mise was older. I've updated and this diff went away.
| // lowest histogram bin is -inf (NumericBinLowerBound in XGBoost's | ||
| // src/common/hist_util.h). | ||
| func checkSplitCondition(id int64, sc float64, categorical, isLeaf bool) error { | ||
| if categorical { |
There was a problem hiding this comment.
Codex complains about this. Not sure if it matters:
- gen/parse.go:383: checkSplitCondition returns immediately for categorical nodes before checking isLeaf. A malformed tree can mark a leaf as categorical and set its
leaf value to NaN/Infinity; parsing accepts it, then terminal_node.tmpl emits sum += NaN or sum += +Inf instead of returning the new clear error.
There was a problem hiding this comment.
This is worth handling. I'll validate the leaf case first so that doesn't slip through.
It's difficult to reproduce a model empirically with the -Infinity split. This appears to depend on very specific floating point conditions holding at deeper nodes. So I've included a manually constructed example in gen/testdata/neg-inf-split/model.json instead.
Summary by CodeRabbit
-Infinityinsplit_conditions.+Infinity,NaN) and non-finite leaf values now produce clearer, intentional error messages instead of failing unexpectedly.