Add glum and scikit-survival backends and same reporting output as SEQTaRget by remlapmot · Pull Request #50 · CausalInference/pySEQTarget

remlapmot · 2026-05-29T10:50:54Z

(Sorry this got biggish)

Currently pySEQTarget runs substantially slower on our short course practical than SEQTaRget. This adds glum and scikit-survival backends to speed up the GLM and Cox fits - I've left the default fitting packages as statsmodels and lifelines.

I've also added as similar output tables to SEQTaRget as I could manage.

When a bootstrap replicate fails to fit (e.g. a singular matrix raising LinAlgError, which the glum solver hits more readily than statsmodels' IRLS), fit() skips it, leaving outcome_model shorter than _boot_samples. hazard() previously assumed a 1:1 ordering between the two: it looped over range(len(_boot_samples)) and indexed outcome_model[boot_idx + 1], which raised IndexError once a skip had occurred and, before that, silently paired every replicate after the skipped one with the wrong resample's model — producing incorrect hazard CIs. bootstrap_loop now records _boot_sample_idx, the original _boot_samples index for each successfully fitted replicate (appended in lockstep with the models, so it is correct for both the serial and parallel paths). hazard() iterates that map, pairing outcome_model[model_pos + 1] with _boot_samples[sample_idx], which fixes both the crash and the misalignment. survival() was unaffected since it iterates outcome_model directly and never indexes _boot_samples. Adds a regression test that injects a LinAlgError on one replicate and asserts hazard() returns a finite HR with CI instead of crashing.

…mFit`:

Add a cox_package option (default "lifelines") to fit the univariate Cox model in the hazard step with either lifelines or scikit-survival. The Cox fit is extracted into a _cox_log_hr dispatcher: the lifelines path is unchanged, while the scikit-survival path builds the structured survival array and fits CoxPHSurvivalAnalysis with ties="efron" to match lifelines - this matters because integer follow-up produces many tied event times and the default Breslow handling would diverge. With matching tie handling and a fixed seed the two backends agree to ~1e-9 on the point hazard ratio and the bootstrap CIs match. Re-adds scikit-survival as a dependency and adds tests covering validation, ITT and bootstrap equivalence with lifelines, and the subgroup path.

ryan-odea

I've minimized some of the simple properties, but otherwise looks great!

remlapmot and others added 10 commits May 28, 2026 13:40

Add glum backend for faster GLM fitting

c68dcf9

Add tests for glum backend

f4919d1

Auto-format code

0a6d4b1

Add .summary(), .summary2(), .bse, and .cov_params() to `_Glu…

733281f

…mFit`:

Add tests for glum summary

acefaab

Add per-arm follow-up counts to diagnostics

e58bd03

Bump version

79a7e37

Add per-arm competing-event diagnostic tables

8b21411

remlapmot requested a review from ryan-odea May 29, 2026 10:50

Minimized some of the easy returned properties

9d5fef3

ryan-odea approved these changes May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add glum and scikit-survival backends and same reporting output as SEQTaRget#50

Add glum and scikit-survival backends and same reporting output as SEQTaRget#50
remlapmot wants to merge 11 commits into
mainfrom
devel-2026-05-19-glum

remlapmot commented May 29, 2026

Uh oh!

ryan-odea left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

remlapmot commented May 29, 2026

Uh oh!

ryan-odea left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants