Add glum and scikit-survival backends and same reporting output as SEQTaRget#50
Open
remlapmot wants to merge 11 commits into
Open
Add glum and scikit-survival backends and same reporting output as SEQTaRget#50remlapmot wants to merge 11 commits into
remlapmot wants to merge 11 commits into
Conversation
When a bootstrap replicate fails to fit (e.g. a singular matrix raising LinAlgError, which the glum solver hits more readily than statsmodels' IRLS), fit() skips it, leaving outcome_model shorter than _boot_samples. hazard() previously assumed a 1:1 ordering between the two: it looped over range(len(_boot_samples)) and indexed outcome_model[boot_idx + 1], which raised IndexError once a skip had occurred and, before that, silently paired every replicate after the skipped one with the wrong resample's model — producing incorrect hazard CIs. bootstrap_loop now records _boot_sample_idx, the original _boot_samples index for each successfully fitted replicate (appended in lockstep with the models, so it is correct for both the serial and parallel paths). hazard() iterates that map, pairing outcome_model[model_pos + 1] with _boot_samples[sample_idx], which fixes both the crash and the misalignment. survival() was unaffected since it iterates outcome_model directly and never indexes _boot_samples. Adds a regression test that injects a LinAlgError on one replicate and asserts hazard() returns a finite HR with CI instead of crashing.
Add a cox_package option (default "lifelines") to fit the univariate Cox model in the hazard step with either lifelines or scikit-survival. The Cox fit is extracted into a _cox_log_hr dispatcher: the lifelines path is unchanged, while the scikit-survival path builds the structured survival array and fits CoxPHSurvivalAnalysis with ties="efron" to match lifelines - this matters because integer follow-up produces many tied event times and the default Breslow handling would diverge. With matching tie handling and a fixed seed the two backends agree to ~1e-9 on the point hazard ratio and the bootstrap CIs match. Re-adds scikit-survival as a dependency and adds tests covering validation, ITT and bootstrap equivalence with lifelines, and the subgroup path.
ryan-odea
approved these changes
May 31, 2026
Collaborator
ryan-odea
left a comment
There was a problem hiding this comment.
I've minimized some of the simple properties, but otherwise looks great!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(Sorry this got biggish)
Currently pySEQTarget runs substantially slower on our short course practical than SEQTaRget. This adds glum and scikit-survival backends to speed up the GLM and Cox fits - I've left the default fitting packages as statsmodels and lifelines.
I've also added as similar output tables to SEQTaRget as I could manage.