Skip to content

feat(efficient_did): survey-weighted Silverman bandwidth in conditional Omega*#594

Merged
igerber merged 2 commits into
mainfrom
feature/efficient-did-survey-silverman-bw
Jul 1, 2026
Merged

feat(efficient_did): survey-weighted Silverman bandwidth in conditional Omega*#594
igerber merged 2 commits into
mainfrom
feature/efficient-did-survey-silverman-bw

Conversation

@igerber

@igerber igerber commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

  • The EfficientDiD covariate DR path's auto Silverman bandwidth for the kernel-smoothed conditional Omega*(X) previously used the unweighted per-dimension dispersion over the positive-weight support. Under survey weights that does not reflect the population covariate distribution the kernel targets.
  • Make the per-dimension dispersion survey-weighted: median_std is now the median across covariate dimensions of the weighted std sqrt(sum_i w_i (x_i - xbar_w)^2 / sum_i w_i) (weighted mean xbar_w = sum_i w_i x_i / sum_i w_i). The rate term n stays the positive-weight support count (dispersion weighted, sample-size term not — the TODO-scoped refinement; not Kish n_eff, an explicit design choice).
  • Behavior: shifts the DR point estimate and SE only in overidentified (H>1) covariate cells under non-uniform survey weights. Under uniform weights it reduces to the previous bandwidth up to floating point; the documented invariances to zero-weight (subpopulation / padded) rows and to weight rescaling are preserved. Non-survey and just-identified (H=1) paths are byte-unchanged.
  • docs/methodology/REGISTRY.md: updated the EfficientDiD covariates Note (was "dispersion stays unweighted ... deferred"). TODO.md: removed the actionable row. CHANGELOG.md: ### Changed entry.

Methodology references

  • Method name(s): EfficientDiD (Chen, Sant'Anna & Xie 2025) covariate doubly-robust path; kernel-smoothed conditional Omega*(X) (Eq 3.12) with Silverman's rule-of-thumb bandwidth.
  • Paper / source link(s): Silverman (1986) multivariate normal-reference rule h = (4/(d+2))^{1/(d+4)} sigma n^{-1/(d+4)}; survey-weighted moments = design-consistent estimates of the population dispersion. REGISTRY § "EfficientDiD" covariates Note.
  • Any intentional deviations from the source (and why): The rate term n uses the positive-weight support count rather than Kish effective n_eff — a deliberately scoped refinement (weight the dispersion, not the rate), documented in the REGISTRY Note and the function docstring. Preserves exact reduction to the prior bandwidth under uniform weights.

Validation

  • Tests added/updated: tests/test_methodology_efficient_did.py::TestSurveyWeightedSilvermanBandwidth (4 tests: matches the weighted reference formula and differs from unweighted under non-uniform weights; reduces to unweighted under uniform weights; invariant to weight scale; invariant to zero-weight padding). The existing overidentified H>1 end-to-end zero-weight-invariance test still passes.
  • Full EfficientDiD suites: tests/test_methodology_efficient_did.py + tests/test_efficient_did.py = 207 passed, 0 failures. black/ruff clean; mypy 0-new.
  • Backtest / simulation / notebook evidence (if applicable): N/A.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

…al Omega*

The auto Silverman bandwidth for the kernel-smoothed conditional Omega*(X) in
the EfficientDiD covariate DR path previously used the unweighted per-dimension
dispersion over the positive-weight support. Under survey weights the unweighted
sample dispersion does not reflect the population covariate distribution the
kernel targets.

Make the per-dimension dispersion survey-weighted: median_std is now the median
across covariate dimensions of the weighted std
sqrt(sum_i w_i (x_i - xbar_w)^2 / sum_i w_i), with weighted mean
xbar_w = sum_i w_i x_i / sum_i w_i. The rate term n stays the positive-weight
support count (dispersion weighted, sample-size term not -- the TODO-scoped
refinement; not Kish n_eff).

Behavior: shifts the DR point estimate and SE only in overidentified (H>1)
covariate cells under non-uniform survey weights. Under uniform weights it
reduces to the previous bandwidth up to floating point; the existing invariances
to zero-weight (subpopulation / padded) rows and to weight rescaling are
preserved. Non-survey and just-identified (H=1) paths are unchanged.

- efficient_did_covariates.py: weighted-dispersion bandwidth + docstring.
- REGISTRY.md: update the EfficientDiD covariates Note (was "dispersion stays
  unweighted ... deferred").
- test_methodology_efficient_did.py: TestSurveyWeightedSilvermanBandwidth
  (matches weighted reference formula + differs from unweighted; reduces to
  unweighted under uniform weights; invariant to weight scale and zero-weight
  padding). Existing H>1 zero-weight-invariance end-to-end test still passes.
- TODO.md: remove the actionable row; CHANGELOG.md: ### Changed entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Overall Assessment: ✅ Looks Good

Executive Summary

  • No unmitigated P0/P1 findings.
  • Affected method: EfficientDiD covariate doubly robust path, specifically conditional Omega*(X) auto Silverman bandwidth.
  • The implementation matches the updated REGISTRY note: weighted dispersion over positive-weight support, with rate term n kept as positive-weight count.
  • The non-Kish rate-term choice is documented in docs/methodology/REGISTRY.md:L1140, so it is P3 informational, not a blocker.
  • Focused tests could not be run in this environment because pytest and numpy are not installed.

Methodology

Finding: Documented Rate-Term Convention

  • Severity: P3 informational
  • Location: docs/methodology/REGISTRY.md:L1140, diff_diff/efficient_did_covariates.py:L807-L849
  • Impact: The bandwidth uses survey-weighted dispersion but keeps Silverman’s rate term as the positive-weight support count rather than Kish effective n. This can differ under highly variable weights, but the REGISTRY explicitly documents it as a scoped convention.
  • Concrete fix: No action required under the review rules. The documented implementation matches the code.

Code Quality

No findings. The helper remains narrowly scoped, and the weighted mean/variance calculation preserves the documented scale and zero-weight invariances.

Performance

No findings. The new weighted dispersion is O(n*d) and negligible relative to the existing conditional Omega*(X) kernel work.

Maintainability

No findings. The behavior is localized to _silverman_bandwidth() and covered by direct reference-formula tests.

Tech Debt

No findings. The prior TODO entry for this refinement was removed, consistent with the implementation landing in this PR.

Security

No findings. No secrets, credential handling, network behavior, or unsafe file operations are introduced.

Documentation/Tests

Finding: Minor Docstring Wording Drift

  • Severity: P3
  • Location: diff_diff/efficient_did_covariates.py:L814-L816
  • Impact: The docstring says “TODO-scoped refinement” even though the TODO row was removed. The REGISTRY wording is clearer: “deliberately scoped refinement, not Kish n_eff.”
  • Concrete fix: Optional: reword that parenthetical to match the REGISTRY.

Verification note: I attempted pytest -q tests/test_methodology_efficient_did.py::TestSurveyWeightedSilvermanBandwidth, but this runner has no pytest; a direct import check also failed because numpy is unavailable.

Addresses the CI review's P3: the docstring referenced "the TODO-scoped
refinement" but the TODO row was removed in this PR, leaving a dangling
reference. Match the REGISTRY wording ("a deliberately scoped refinement, not
Kish n_eff"). Docstring-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 56b13458ab7f7d2c40dded04c3af6d27c3ed4596


Overall Assessment: ✅ Looks Good

Executive Summary

  • No unmitigated P0/P1 findings.
  • Affected method: EfficientDiD covariate doubly robust path, specifically auto Silverman bandwidth for conditional Omega*(X).
  • The implementation matches the updated REGISTRY note: survey-weighted dispersion over positive-weight support, with rate term n kept as the positive-weight count.
  • The prior P3 docstring wording drift is resolved.
  • I could not run tests here because pytest is unavailable and numpy is not installed.

Methodology

Finding: Documented Rate-Term Convention

  • Severity: P3 informational
  • Location: diff_diff/efficient_did_covariates.py:L802-L850, docs/methodology/REGISTRY.md:L1135-L1140, docs/methodology/papers/chen-santanna-xie-2025-review.md:L130-L192
  • Impact: The code now uses survey-weighted per-dimension dispersion and preserves the documented non-Kish rate term convention. The paper review says the source paper specifies Nadaraya-Watson kernel smoothing with bandwidth h but does not mandate Gaussian/Silverman, so this is a documented implementation choice, not a methodology defect.
  • Concrete fix: No action required.

Code Quality

No findings. The change is localized to _silverman_bandwidth() and does not introduce inline inference, parameter propagation, or NaN-guard anti-patterns.

Performance

No findings. The weighted dispersion calculation is O(n*d), negligible relative to the existing conditional Omega*(X) kernel path.

Maintainability

No findings. The docstring and REGISTRY now agree, including the “deliberately scoped refinement, not Kish n_eff” wording.

Tech Debt

No findings. The prior TODO row for this bandwidth refinement was removed consistently with the implementation landing.

Security

No findings. No credential, secret, network, or unsafe file-operation changes were introduced in the changed files.

Documentation/Tests

No PR findings. Added tests cover the weighted reference formula, uniform-weight reduction, weight-scale invariance, and zero-weight padding invariance at tests/test_methodology_efficient_did.py:L773-L837.

Verification note: pytest -q tests/test_methodology_efficient_did.py::TestSurveyWeightedSilvermanBandwidth could not run because pytest is not installed; a direct import check also failed because numpy is unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jul 1, 2026
@igerber igerber merged commit d572925 into main Jul 1, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/efficient-did-survey-silverman-bw branch July 1, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant