feat(efficient_did): survey-weighted Silverman bandwidth in conditional Omega* by igerber · Pull Request #594 · igerber/diff-diff

igerber · 2026-07-01T11:45:23Z

Summary

The EfficientDiD covariate DR path's auto Silverman bandwidth for the kernel-smoothed conditional Omega*(X) previously used the unweighted per-dimension dispersion over the positive-weight support. Under survey weights that does not reflect the population covariate distribution the kernel targets.
Make the per-dimension dispersion survey-weighted: median_std is now the median across covariate dimensions of the weighted std sqrt(sum_i w_i (x_i - xbar_w)^2 / sum_i w_i) (weighted mean xbar_w = sum_i w_i x_i / sum_i w_i). The rate term n stays the positive-weight support count (dispersion weighted, sample-size term not — the TODO-scoped refinement; not Kish n_eff, an explicit design choice).
Behavior: shifts the DR point estimate and SE only in overidentified (H>1) covariate cells under non-uniform survey weights. Under uniform weights it reduces to the previous bandwidth up to floating point; the documented invariances to zero-weight (subpopulation / padded) rows and to weight rescaling are preserved. Non-survey and just-identified (H=1) paths are byte-unchanged.
docs/methodology/REGISTRY.md: updated the EfficientDiD covariates Note (was "dispersion stays unweighted ... deferred"). TODO.md: removed the actionable row. CHANGELOG.md: ### Changed entry.

Methodology references

Method name(s): EfficientDiD (Chen, Sant'Anna & Xie 2025) covariate doubly-robust path; kernel-smoothed conditional Omega*(X) (Eq 3.12) with Silverman's rule-of-thumb bandwidth.
Paper / source link(s): Silverman (1986) multivariate normal-reference rule h = (4/(d+2))^{1/(d+4)} sigma n^{-1/(d+4)}; survey-weighted moments = design-consistent estimates of the population dispersion. REGISTRY § "EfficientDiD" covariates Note.
Any intentional deviations from the source (and why): The rate term n uses the positive-weight support count rather than Kish effective n_eff — a deliberately scoped refinement (weight the dispersion, not the rate), documented in the REGISTRY Note and the function docstring. Preserves exact reduction to the prior bandwidth under uniform weights.

Validation

Tests added/updated: tests/test_methodology_efficient_did.py::TestSurveyWeightedSilvermanBandwidth (4 tests: matches the weighted reference formula and differs from unweighted under non-uniform weights; reduces to unweighted under uniform weights; invariant to weight scale; invariant to zero-weight padding). The existing overidentified H>1 end-to-end zero-weight-invariance test still passes.
Full EfficientDiD suites: tests/test_methodology_efficient_did.py + tests/test_efficient_did.py = 207 passed, 0 failures. black/ruff clean; mypy 0-new.
Backtest / simulation / notebook evidence (if applicable): N/A.

Security / privacy

Confirm no secrets/PII in this PR: Yes

…al Omega* The auto Silverman bandwidth for the kernel-smoothed conditional Omega*(X) in the EfficientDiD covariate DR path previously used the unweighted per-dimension dispersion over the positive-weight support. Under survey weights the unweighted sample dispersion does not reflect the population covariate distribution the kernel targets. Make the per-dimension dispersion survey-weighted: median_std is now the median across covariate dimensions of the weighted std sqrt(sum_i w_i (x_i - xbar_w)^2 / sum_i w_i), with weighted mean xbar_w = sum_i w_i x_i / sum_i w_i. The rate term n stays the positive-weight support count (dispersion weighted, sample-size term not -- the TODO-scoped refinement; not Kish n_eff). Behavior: shifts the DR point estimate and SE only in overidentified (H>1) covariate cells under non-uniform survey weights. Under uniform weights it reduces to the previous bandwidth up to floating point; the existing invariances to zero-weight (subpopulation / padded) rows and to weight rescaling are preserved. Non-survey and just-identified (H=1) paths are unchanged. - efficient_did_covariates.py: weighted-dispersion bandwidth + docstring. - REGISTRY.md: update the EfficientDiD covariates Note (was "dispersion stays unweighted ... deferred"). - test_methodology_efficient_did.py: TestSurveyWeightedSilvermanBandwidth (matches weighted reference formula + differs from unweighted; reduces to unweighted under uniform weights; invariant to weight scale and zero-weight padding). Existing H>1 zero-weight-invariance end-to-end test still passes. - TODO.md: remove the actionable row; CHANGELOG.md: ### Changed entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-07-01T11:47:31Z

Overall Assessment: ✅ Looks Good

Executive Summary

No unmitigated P0/P1 findings.
Affected method: EfficientDiD covariate doubly robust path, specifically conditional Omega*(X) auto Silverman bandwidth.
The implementation matches the updated REGISTRY note: weighted dispersion over positive-weight support, with rate term n kept as positive-weight count.
The non-Kish rate-term choice is documented in docs/methodology/REGISTRY.md:L1140, so it is P3 informational, not a blocker.
Focused tests could not be run in this environment because pytest and numpy are not installed.

Methodology

Finding: Documented Rate-Term Convention

Severity: P3 informational
Location: docs/methodology/REGISTRY.md:L1140, diff_diff/efficient_did_covariates.py:L807-L849
Impact: The bandwidth uses survey-weighted dispersion but keeps Silverman’s rate term as the positive-weight support count rather than Kish effective n. This can differ under highly variable weights, but the REGISTRY explicitly documents it as a scoped convention.
Concrete fix: No action required under the review rules. The documented implementation matches the code.

Code Quality

No findings. The helper remains narrowly scoped, and the weighted mean/variance calculation preserves the documented scale and zero-weight invariances.

Performance

No findings. The new weighted dispersion is O(n*d) and negligible relative to the existing conditional Omega*(X) kernel work.

Maintainability

No findings. The behavior is localized to _silverman_bandwidth() and covered by direct reference-formula tests.

Tech Debt

No findings. The prior TODO entry for this refinement was removed, consistent with the implementation landing in this PR.

Security

No findings. No secrets, credential handling, network behavior, or unsafe file operations are introduced.

Documentation/Tests

Finding: Minor Docstring Wording Drift

Severity: P3
Location: diff_diff/efficient_did_covariates.py:L814-L816
Impact: The docstring says “TODO-scoped refinement” even though the TODO row was removed. The REGISTRY wording is clearer: “deliberately scoped refinement, not Kish n_eff.”
Concrete fix: Optional: reword that parenthetical to match the REGISTRY.

Verification note: I attempted pytest -q tests/test_methodology_efficient_did.py::TestSurveyWeightedSilvermanBandwidth, but this runner has no pytest; a direct import check also failed because numpy is unavailable.

Addresses the CI review's P3: the docstring referenced "the TODO-scoped refinement" but the TODO row was removed in this PR, leaving a dangling reference. Match the REGISTRY wording ("a deliberately scoped refinement, not Kish n_eff"). Docstring-only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-07-01T11:51:16Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 56b13458ab7f7d2c40dded04c3af6d27c3ed4596

Overall Assessment: ✅ Looks Good

Executive Summary

No unmitigated P0/P1 findings.
Affected method: EfficientDiD covariate doubly robust path, specifically auto Silverman bandwidth for conditional Omega*(X).
The implementation matches the updated REGISTRY note: survey-weighted dispersion over positive-weight support, with rate term n kept as the positive-weight count.
The prior P3 docstring wording drift is resolved.
I could not run tests here because pytest is unavailable and numpy is not installed.

Methodology

Finding: Documented Rate-Term Convention

Severity: P3 informational
Location: diff_diff/efficient_did_covariates.py:L802-L850, docs/methodology/REGISTRY.md:L1135-L1140, docs/methodology/papers/chen-santanna-xie-2025-review.md:L130-L192
Impact: The code now uses survey-weighted per-dimension dispersion and preserves the documented non-Kish rate term convention. The paper review says the source paper specifies Nadaraya-Watson kernel smoothing with bandwidth h but does not mandate Gaussian/Silverman, so this is a documented implementation choice, not a methodology defect.
Concrete fix: No action required.

Code Quality

No findings. The change is localized to _silverman_bandwidth() and does not introduce inline inference, parameter propagation, or NaN-guard anti-patterns.

Performance

No findings. The weighted dispersion calculation is O(n*d), negligible relative to the existing conditional Omega*(X) kernel path.

Maintainability

No findings. The docstring and REGISTRY now agree, including the “deliberately scoped refinement, not Kish n_eff” wording.

Tech Debt

No findings. The prior TODO row for this bandwidth refinement was removed consistently with the implementation landing.

Security

No findings. No credential, secret, network, or unsafe file-operation changes were introduced in the changed files.

Documentation/Tests

No PR findings. Added tests cover the weighted reference formula, uniform-weight reduction, weight-scale invariance, and zero-weight padding invariance at tests/test_methodology_efficient_did.py:L773-L837.

Verification note: pytest -q tests/test_methodology_efficient_did.py::TestSurveyWeightedSilvermanBandwidth could not run because pytest is not installed; a direct import check also failed because numpy is unavailable.

igerber added the ready-for-ci Triggers CI test workflows label Jul 1, 2026

igerber merged commit d572925 into main Jul 1, 2026
33 of 34 checks passed

igerber deleted the feature/efficient-did-survey-silverman-bw branch July 1, 2026 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(efficient_did): survey-weighted Silverman bandwidth in conditional Omega*#594

feat(efficient_did): survey-weighted Silverman bandwidth in conditional Omega*#594
igerber merged 2 commits into
mainfrom
feature/efficient-did-survey-silverman-bw

igerber commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jul 1, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jul 1, 2026

Overall Assessment: ✅ Looks Good

Executive Summary

Methodology

Finding: Documented Rate-Term Convention

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Finding: Minor Docstring Wording Drift

Uh oh!

github-actions Bot commented Jul 1, 2026

Overall Assessment: ✅ Looks Good

Executive Summary

Methodology

Finding: Documented Rate-Term Convention

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant