Skip to content

Missingness splits with -Infinity thresholds require special handling#124

Merged
ndawe merged 3 commits into
maxmind:mainfrom
ndawe:ndawe/inf
Jun 26, 2026
Merged

Missingness splits with -Infinity thresholds require special handling#124
ndawe merged 3 commits into
maxmind:mainfrom
ndawe:ndawe/inf

Conversation

@ndawe

@ndawe ndawe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

It's difficult to reproduce a model empirically with the -Infinity split. This appears to depend on very specific floating point conditions holding at deeper nodes. So I've included a manually constructed example in gen/testdata/neg-inf-split/model.json instead.

Summary by CodeRabbit

  • Bug Fixes
    • Improved parsing for XGBoost tree models by adding support for -Infinity in split_conditions.
    • Non-finite split values (+Infinity, NaN) and non-finite leaf values now produce clearer, intentional error messages instead of failing unexpectedly.
  • Tests
    • Expanded automated coverage to include an additional model scenario.
    • The test suite now validates generated code and prediction execution for the new model alongside the existing cases.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (2)
  • gen/parse.go is excluded by !**/gen/**
  • gen/parse_test.go is excluded by !**/gen/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: ce1e5b1a-1656-482d-880b-449251395fec

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The module version is bumped to 1.2.0, the changelog records the new tree-splitting parse behavior, and TestGenerateAndRunModels adds coverage for the neg-inf-split model.

Changes

Release metadata and changelog

Layer / File(s) Summary
Update release metadata
version.go, CHANGELOG.md
The module version is bumped to 1.2.0, and the changelog adds a 1.2.0 entry describing the new -Infinity split handling and non-finite parsing errors.

Model generation test coverage

Layer / File(s) Summary
Add model case
main_test.go
TestGenerateAndRunModels includes one additional neg-inf-split table entry.

Sequence Diagram(s)

None.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

  • oschwald

Poem

A bunny hopped by with a tiny cheer,
“Version one-two-zero is now here!
One more model joins the test parade,
and changelog notes the split-tree trade.
Hoppy bytes, hooray—code changes made!”

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: special handling for missingness splits with -Infinity thresholds.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for handling non-finite float values (such as -Infinity, Infinity, and NaN) in XGBoost JSON models, specifically addressing the -Infinity threshold on numeric splits (representing missingness splits). It adds a preprocessing step (sanitizeNonFiniteNumbers) to rewrite these JSON-incompatible literals into quoted strings so they can be parsed by Go's standard JSON decoder via a custom xgbFloat type. Additionally, the code generator is updated to collapse decision nodes with a -Infinity threshold into clean missingness-only branches, and corresponding unit tests and test data are added to verify this behavior. There are no review comments, so no further feedback is provided.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@ndawe ndawe force-pushed the ndawe/inf branch 2 times, most recently from c17f095 to cf449e9 Compare June 26, 2026 14:37
Comment thread mise.lock Outdated
@@ -1,4 +1,4 @@
# @generated - this file is auto-generated by `mise lock` https://mise.en.dev/dev-tools/mise-lock.html
# @generated - this file is auto-generated by `mise lock` https://mise.jdx.dev/dev-tools/mise-lock.html

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an update available for your mise? I think we're swapping back and forth with this header. If you update it I think it should revert back to the en.dev domain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. My mise was older. I've updated and this diff went away.

Comment thread gen/parse.go
// lowest histogram bin is -inf (NumericBinLowerBound in XGBoost's
// src/common/hist_util.h).
func checkSplitCondition(id int64, sc float64, categorical, isLeaf bool) error {
if categorical {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex complains about this. Not sure if it matters:

  - gen/parse.go:383: checkSplitCondition returns immediately for categorical nodes before checking isLeaf. A malformed tree can mark a leaf as categorical and set its
    leaf value to NaN/Infinity; parsing accepts it, then terminal_node.tmpl emits sum += NaN or sum += +Inf instead of returning the new clear error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth handling. I'll validate the leaf case first so that doesn't slip through.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2c8add0

@ndawe ndawe requested a review from horgh June 26, 2026 16:11
@ndawe ndawe merged commit aa965c1 into maxmind:main Jun 26, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants