feat(efficient_did): survey-weighted Silverman bandwidth in conditional Omega*#594
Conversation
…al Omega* The auto Silverman bandwidth for the kernel-smoothed conditional Omega*(X) in the EfficientDiD covariate DR path previously used the unweighted per-dimension dispersion over the positive-weight support. Under survey weights the unweighted sample dispersion does not reflect the population covariate distribution the kernel targets. Make the per-dimension dispersion survey-weighted: median_std is now the median across covariate dimensions of the weighted std sqrt(sum_i w_i (x_i - xbar_w)^2 / sum_i w_i), with weighted mean xbar_w = sum_i w_i x_i / sum_i w_i. The rate term n stays the positive-weight support count (dispersion weighted, sample-size term not -- the TODO-scoped refinement; not Kish n_eff). Behavior: shifts the DR point estimate and SE only in overidentified (H>1) covariate cells under non-uniform survey weights. Under uniform weights it reduces to the previous bandwidth up to floating point; the existing invariances to zero-weight (subpopulation / padded) rows and to weight rescaling are preserved. Non-survey and just-identified (H=1) paths are unchanged. - efficient_did_covariates.py: weighted-dispersion bandwidth + docstring. - REGISTRY.md: update the EfficientDiD covariates Note (was "dispersion stays unweighted ... deferred"). - test_methodology_efficient_did.py: TestSurveyWeightedSilvermanBandwidth (matches weighted reference formula + differs from unweighted; reduces to unweighted under uniform weights; invariant to weight scale and zero-weight padding). Existing H>1 zero-weight-invariance end-to-end test still passes. - TODO.md: remove the actionable row; CHANGELOG.md: ### Changed entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Overall Assessment: ✅ Looks GoodExecutive Summary
MethodologyFinding: Documented Rate-Term Convention
Code QualityNo findings. The helper remains narrowly scoped, and the weighted mean/variance calculation preserves the documented scale and zero-weight invariances. PerformanceNo findings. The new weighted dispersion is MaintainabilityNo findings. The behavior is localized to Tech DebtNo findings. The prior TODO entry for this refinement was removed, consistent with the implementation landing in this PR. SecurityNo findings. No secrets, credential handling, network behavior, or unsafe file operations are introduced. Documentation/TestsFinding: Minor Docstring Wording Drift
Verification note: I attempted |
Addresses the CI review's P3: the docstring referenced "the TODO-scoped
refinement" but the TODO row was removed in this PR, leaving a dangling
reference. Match the REGISTRY wording ("a deliberately scoped refinement, not
Kish n_eff"). Docstring-only; no behavior change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment: ✅ Looks GoodExecutive Summary
MethodologyFinding: Documented Rate-Term Convention
Code QualityNo findings. The change is localized to PerformanceNo findings. The weighted dispersion calculation is MaintainabilityNo findings. The docstring and REGISTRY now agree, including the “deliberately scoped refinement, not Kish Tech DebtNo findings. The prior TODO row for this bandwidth refinement was removed consistently with the implementation landing. SecurityNo findings. No credential, secret, network, or unsafe file-operation changes were introduced in the changed files. Documentation/TestsNo PR findings. Added tests cover the weighted reference formula, uniform-weight reduction, weight-scale invariance, and zero-weight padding invariance at Verification note: |
Summary
Omega*(X)previously used the unweighted per-dimension dispersion over the positive-weight support. Under survey weights that does not reflect the population covariate distribution the kernel targets.median_stdis now the median across covariate dimensions of the weighted stdsqrt(sum_i w_i (x_i - xbar_w)^2 / sum_i w_i)(weighted meanxbar_w = sum_i w_i x_i / sum_i w_i). The rate termnstays the positive-weight support count (dispersion weighted, sample-size term not — the TODO-scoped refinement; not Kishn_eff, an explicit design choice).docs/methodology/REGISTRY.md: updated the EfficientDiD covariates Note (was "dispersion stays unweighted ... deferred").TODO.md: removed the actionable row.CHANGELOG.md:### Changedentry.Methodology references
Omega*(X)(Eq 3.12) with Silverman's rule-of-thumb bandwidth.h = (4/(d+2))^{1/(d+4)} sigma n^{-1/(d+4)}; survey-weighted moments = design-consistent estimates of the population dispersion. REGISTRY § "EfficientDiD" covariates Note.nuses the positive-weight support count rather than Kish effectiven_eff— a deliberately scoped refinement (weight the dispersion, not the rate), documented in the REGISTRY Note and the function docstring. Preserves exact reduction to the prior bandwidth under uniform weights.Validation
tests/test_methodology_efficient_did.py::TestSurveyWeightedSilvermanBandwidth(4 tests: matches the weighted reference formula and differs from unweighted under non-uniform weights; reduces to unweighted under uniform weights; invariant to weight scale; invariant to zero-weight padding). The existing overidentified H>1 end-to-end zero-weight-invariance test still passes.tests/test_methodology_efficient_did.py+tests/test_efficient_did.py= 207 passed, 0 failures. black/ruff clean; mypy 0-new.Security / privacy