feat(had): cluster-robust SE on the continuous paths (Phase 2a)#596
Conversation
PR Review ReportOverall Assessment⛔ Blocker — one unmitigated P0 finding. Executive Summary
MethodologyFinding 1Severity: P0 Code QualitySeverity: None PerformanceSeverity: None MaintainabilitySeverity: P2 Tech DebtFinding 1Severity: P3 informational SecuritySeverity: None Documentation / TestsFinding 1Severity: P1
Path to Approval
|
c44cb7e to
802ff70
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR Review ReportOverall Assessment⛔ Blocker — one unmitigated P0 remains. Executive Summary
MethodologyFinding 1Severity: P0 Concrete fix: Count effective clusters after the same active-window selection used by Code QualitySeverity: None PerformanceSeverity: None MaintainabilitySeverity: None Tech DebtFinding 1Severity: P3 informational SecuritySeverity: None Documentation / TestsFinding 1Severity: P1 Path to Approval
|
Thread cluster= into bias_corrected_local_linear on the continuous designs (continuous_at_zero / continuous_near_d_lower). Previously cluster= was ignored on the continuous path with a UserWarning; the Phase-1c wrapper already supports cluster, so the estimator just needed to pass it through. The CCT-2014 robust variance becomes cluster-robust and the beta-scale SE is se_robust / |den|. - Resolve the per-unit cluster array for the continuous designs (was mass-point only) and thread it into _fit_continuous -> bias_corrected_local_linear. - Composes with the weights= shortcut (weighted cluster-robust). - cluster= + survey_design= raises NotImplementedError (rejected BEFORE cluster extraction so the error is predictable even with a malformed cluster column; the Binder-TSL survey variance would override the cluster-robust SE -- route clustering through survey_design=SurveyDesign(psu=<col>)). - Cluster must be unit-constant: a nonexistent column, NaN, or within-unit- varying cluster now raises (mirrors the mass-point path) instead of being silently ignored. - Single-cluster guard at the variance-computation site: _nprobust_port.lprobust NaNs se_rb/se_cl when fewer than two clusters fall in the ACTIVE KERNEL WINDOW (eC = cluster[ind]) -- a stricter condition than the global cluster count, since clusters can be separated from the boundary by the bandwidth. This NaN-couples the downstream t-stat / p-value / CI (att stays finite), matching the mass-point CR1 single-cluster contract, and also covers the direct bias_corrected_local_linear API. A window with >=2 clusters is bit-identical, so the nprobust clustered DGP-4 golden parity is preserved. - Result metadata reports vcov_type="cr1" + cluster_name; inference_method stays "analytical_nonparametric". - Event-study (Phase 2b) cluster threading remains a follow-up (the per-horizon cband sup-t bootstrap would mix variance families under clustering); that path still emits the "cluster ignored" warning. Validated: the estimator clustered SE equals the direct bias_corrected_local_linear(cluster=...).se_robust / |den| to machine precision, unweighted and weighted; single/degenerate-cluster fits NaN the inference triple. REGISTRY note + CHANGELOG + TODO (Phase 2b remains) updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
802ff70 to
f3419fb
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR Review ReportOverall Assessment✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
MethodologyFinding 1Severity: None Finding 2Severity: None Code QualityFinding 1Severity: None PerformanceFinding 1Severity: None MaintainabilityFinding 1Severity: None Tech DebtFinding 1Severity: P3 informational SecurityFinding 1Severity: None Documentation / TestsFinding 1Severity: None Finding 2Severity: P3 informational |
Summary
cluster=intobias_corrected_local_linearon theHeterogeneousAdoptionDiDcontinuous designs (continuous_at_zero/continuous_near_d_lower). Previouslycluster=was ignored on the continuous path with aUserWarning; the Phase-1c wrapper already supports cluster, so the estimator just needed to pass it through. The CCT-2014 robust variance becomes cluster-robust and the β̂-scale SE isse_robust / |den|.weights=shortcut (weighted cluster-robust).cluster=+survey_design=raisesNotImplementedError— the Binder (1983) TSL survey variance is composed from the per-unit influence function and would silently override the cluster-robust SE; route clustering throughsurvey_design=SurveyDesign(psu=<cluster_col>)instead.vcov_type="cr1"+cluster_namewithinference_method="analytical_nonparametric"(distinguishing the clustered CCT variance from the mass-point 2SLS CR1 sandwich).cluster=with a warning (its per-horizoncbandsup-t bootstrap normalizes HC-scale perturbations and would mix variance families under clustering); rescoped theTODO.mdrow to Phase 2b.Methodology references
HeterogeneousAdoptionDiDcontinuous-dose designs (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026), Theorem 1 / Eq 3 (Design 1') and Theorem 3 / Eq 11 (Design 1); Phase-1c bias-corrected local linear (Calonico-Cattaneo-Titiunik 2014) cluster-robust variance.bias_corrected_local_linearcluster path (golden-tested vsnprobustDGP 4); the β-scale rescale (se_robust / |den|) is a deterministic linear transform, so the estimator SE equals the direct call to machine precision.Validation
tests/test_had.py::TestClusterHandling— newtest_cluster_threaded_on_continuous_path,test_cluster_weighted_on_continuous_path,test_cluster_threaded_on_continuous_near_d_lower,test_cluster_survey_design_raises_on_continuous,test_auto_design_continuous_threads_cluster; the 4 pre-existing "cluster ignored on continuous → lax validation" tests are flipped to expect the now-correctValueErroron invalid cluster. Each SE test asserts an exact match (atol=1e-12) tobias_corrected_local_linear(cluster=...).se_robust / |den|(unweighted and weighted).tests/test_had.py309 passed / 2 skipped;tests/test_methodology_had.py+tests/test_had_pretests.py281 passed (non-cluster continuous paths byte-unchanged). black/ruff clean; mypy 0-new.Security / privacy