diff --git a/CHANGELOG.md b/CHANGELOG.md index a980cddc..4fa7d9de 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,6 +22,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 `"survey_tsl"` `vcov_type`, and a Survey Design block in `summary()`. The non-survey path is byte-for-byte unchanged. Validated against `survey::svyglm` on the stacked long difference (numeric golden parity is the D2 follow-up). +- **`TROP` non-absorbing (on/off) treatment support** (Athey, Imbens, Qu & Viviano 2025, + §2.1 / Eq. 12 / Algorithm 2). New `non_absorbing` parameter (default `False`). The paper + supports general assignment patterns ("units moving into and out of treatment"), not only + absorbing/staggered adoption; `TROP(non_absorbing=True)` (`method='local'` only) now + accepts treatment that switches on and off, imputing each treated cell's counterfactual via + the paper's `(1-W)` masking. The default `non_absorbing=False` is unchanged and still + rejects non-monotonic D with a `ValueError` (now also pointing to the opt-in), guarding + against the common mistake of encoding absorbing treatment as an event-style spike. This + *removes a prior implementation over-restriction* (the estimator was stricter than the + paper) rather than adding a deviation. `method='global'` keeps its block-assignment + requirement and rejects `non_absorbing=True`. A one-time `UserWarning` is emitted noting + that validity relies on the no-dynamic-effects assumption and that the triple-robustness + guarantee (Theorem 5.1) is proven only under block assignment. The Rust local LOOCV and + point-estimate paths were already mask-driven and unchanged (Rust/Python ATT parity is + regression-tested); the non-absorbing **bootstrap** is routed to the Python path, because + the Rust resampler lacks the no-weighted-control-support guard and can return a degenerate + ~0 SE on an empty control stratum. Treated cells with no weighted control support (e.g. an + always-treated unit under `lambda_unit>0`) are materialized as NaN and excluded from the + ATT (the library non-estimable->NaN convention), with a `UserWarning`. - **`LPDiD` non-absorbing R-parity validation** (Phase C2). Pins both non-absorbing modes against an independent `fixest::feols` reconstruction of the paper's Eq. 12 (`first_entry`) and Eq. 13 (`effect_stabilization`) clean-sample restrictions: variance-weighted point and diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index 08b1fc2a..0dc56f17 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -880,7 +880,7 @@ These three are feature deferrals (paper-supported extensions that the library h | Status | **Complete** (paper `method="local"`, version-pinned to arXiv v2 — see Version Pinning below) | | Last Review | 2026-05-24 | -**Version Pinning:** This methodology promotion is anchored on **arXiv:2508.21536v2** (the version covered by the paper review on file at `docs/methodology/papers/athey-2025-review.md`). The current arXiv version is **v3** (submitted 2026-02-09). A formal v2→v3 source delta-check against the v3 PDF has **NOT** been performed for any of the sections this PR promotes (Eqs. 2-3, Algorithms 1-3, Section 2.2, Section 5.2-5.3, Section 6.1-6.2, Theorem 5.1, Corollary 1, Appendix Theorem 8.1). **Action item:** before the next paper-author reference implementation or substantive v3 release, refresh the paper review against the most recent arXiv version and re-validate the verified-component checklist; until then the promotion stays v2-anchored. +**Version Pinning:** This methodology promotion is anchored on **arXiv:2508.21536v2** (the version covered by the paper review on file at `docs/methodology/papers/athey-2025-review.md`). The current arXiv version is **v3** (submitted 2026-02-09). The **v3 PDF was consulted for the treatment-assignment-pattern sections** during the non-absorbing support work (§2.1, §2.2 Eq. 2, §6.1 Eq. 12 / Algorithm 2, Assumption 1(i), Theorem 5.1), confirming the general-assignment scope behind `TROP(non_absorbing=True)`. A formal v2→v3 source delta-check across the remaining promoted sections (Eqs. 2-3, Algorithms 1-3, Section 2.2, Section 5.2-5.3, Section 6.1-6.2, Theorem 5.1, Corollary 1, Appendix Theorem 8.1) has **NOT** been performed in full. **Action item:** before the next paper-author reference implementation or substantive v3 release, refresh the paper review against the most recent arXiv version and re-validate the verified-component checklist; until then the promotion stays v2-anchored. **Scope:** This methodology promotion covers the paper-aligned `method="local"` path (paper Algorithm 2: per-(i, t) estimation with observation-specific weights). The library also exposes `method="global"`, documented in `REGISTRY.md` as a "computationally efficient adaptation using the (1-W) masking principle from Eq. 2" — a library-side adaptation, NOT the paper's full Algorithm 2 estimator. Defensive coverage of the global method lives in `tests/test_trop.py::TestTROPGlobalMethod` (704 lines, ~30 tests for the global-method-specific surface) and is not duplicated in the methodology walk-through. Methodology promotion of `method="global"` as a primary surface would require either (a) a paper-side derivation of the global adaptation's equivalence to Algorithm 2 under specific conditions, or (b) a separate library-extension methodology review; both are deferred. @@ -892,15 +892,15 @@ These three are feature deferrals (paper-supported extensions that the library h - [x] Corollary 1 (paper p. 23) — **single-draw sanity checks consistent with the three unbiasedness conditions, not a repeated-MC mean-bias study**: each of the three balance conditions (a) unit balance, (b) time balance, (c) ``B = 0`` is exercised on a targeted DGP that makes one condition trivially hold while keeping the others sub-optimal. The assertion in each case is a single-realisation ``|att - τ| < 3 * se`` band using the estimator's own bootstrap SE — this is a smoke check, NOT a repeated-draw Monte Carlo bias study of the paper's conditional-unbiasedness statement under fixed weights. A stronger MC bias study at fixed λ values is deferred (would multiply test runtime by ~30x for marginal additional evidence given the existing 3-σ band already catches order-of-magnitude bias regressions). - [x] Theorem 5.1 (paper p. 23) — **simulation sanity check, not a direct theorem lock**: the paper's bias bound ``|E[τ_hat - τ | L]| <= ||Δ_u|| · ||Δ_t|| · ||B||_*`` is stated for FIXED, non-data-dependent weights. The library's TROP fit uses data-dependent LOOCV-tuned λ values, so the direct conditional bias bound is not tested here. Instead, the methodology test verifies the bound's empirical realisation: TROP RMSE strictly below DID RMSE under a confounded factor DGP with ``true τ = 0`` (calibration measurement: TROP/DID RMSE ratio ≈ 0.34 at ``factor_strength = 1.0``). The direct fixed-weight bound test is deferred — would require exposing oracle Γ / Λ / B from a paper-aligned DGP and computing each component of the bound from instrumented internals. - [x] Section 2.2 special-case reductions: **DiD benchmark sanity check** (not a direct algebraic-equivalence proof) — on a no-interactive-FE multi-period panel (additive unit + time effects only, no factor structure), TROP with ``λ_nn = ∞`` + uniform weights produces an ATT within 0.5 of `DifferenceInDifferences` fitted as `outcome ~ treat * post_flag` (basic 2×2 design with `[const, D, T, D×T]`, extended to repeated observations within each treat×post cell). This is **empirical numerical agreement on a friendly DGP**, NOT a proof of the paper Section 2.2 algebraic reduction (which would require either a true 2-period block-assignment panel where the basic-DiD comparator is the algebraic target, or a comparison against `TwoWayFixedEffects` — both deferred). **Matrix Completion code path exercised, not equivalence-checked** — TROP with uniform weights + finite ``λ_nn`` engages the nuclear-norm prox solver (effective_rank > 0) and recovers ATT better than the DiD-style baseline on a factor-confounded DGP; this verifies the code path activates but does NOT prove equivalence with an independent MC reference implementation (which would require either an external MC port or a hand-written reference solver). SC / SDID reductions deferred — see "Outstanding Concerns". -- [x] Eq. 13 + Algorithm 2 per-(i, t) estimation: ``treatment_effects`` dict contains one finite ``τ_hat_it`` per treated cell; the aggregate ATT equals the unweighted mean of per-cell effects (Eq. 1). **Tests cover block adoption with a constant treatment effect**; **absorbing-state staggered adoption** and **heterogeneous per-cell effects** (paper Remark 6.1) are SUPPORTED by the code path but not directly verified in this methodology surface. **Section 6.1 non-absorbing / on-off / switching assignment patterns are explicitly OUT OF SCOPE** — the implementation rejects non-absorbing D-matrices via `trop_local.py` absorbing-state validation, and the methodology test enforces the rejection contract via `TestTROPDeviations::test_event_style_d_rejected_with_value_error` (event-style D being one specific non-absorbing pattern; the same absorbing-state validator catches all 1→0 transitions). Cross-coverage of the staggered-cohort fit path is `tests/test_methodology_trop.py::TestTROPAlgorithm1LOOCV::test_control_set_includes_pretreat_of_eventually_treated`. +- [x] Eq. 13 + Algorithm 2 per-(i, t) estimation: ``treatment_effects`` dict contains one ``τ_hat_it`` entry per treated cell (finite for estimable cells; NaN for a missing outcome or for a cell whose unit/time fixed effect ``alpha_i + beta_t`` is unidentified by the two-way-FE control fit — i.e. the target unit and target period are not in the same connected component of the observed-control graph (an always-treated unit for any ``lambda_unit``, a fully-treated period, or disconnected control support under ``non_absorbing``; or an unbalanced absorbing panel with entirely-missing unit/period controls — the guard is applied to all local fits, not only non_absorbing, and the bootstrap is forced onto the guarded Python path when trimming occurs); the aggregate ATT equals the unweighted mean of the finite per-cell effects (Eq. 1). Trimming non-estimable cells to NaN matches the library-wide non-estimable→NaN convention and is documented in REGISTRY ## TROP "non-absorbing non-estimable-cell trimming" Note; locked by `TestTROPDeviations::test_non_absorbing_always_treated_unit_not_raw_outcome` and `test_non_absorbing_fully_treated_period_not_estimable`. **Tests cover block adoption with a constant treatment effect**; **absorbing-state staggered adoption** and **heterogeneous per-cell effects** (paper Remark 6.1) are SUPPORTED by the code path but not directly verified for those specific patterns. **Section 6.1 non-absorbing / on-off / switching assignment patterns are SUPPORTED via the opt-in `TROP(non_absorbing=True)` (`method='local'` only)** — matching the paper's general-assignment scope (§2.1; Eq. 12 / Algorithm 2). This *narrows* a prior implementation over-restriction (the shipped estimator was stricter than the paper) rather than adding a deviation. The default `non_absorbing=False` still rejects non-monotonic D as a defensive guard; recovery on a no-dynamic-effects toggling DGP + the caveat warning are locked by `TestTROPDeviations::test_non_absorbing_general_assignment_supported`, and the default-mode rejection contract by `TestTROPDeviations::test_event_style_d_rejected_with_value_error`. Inference caveat: Theorem 5.1's triple-robustness guarantee is proven under Assumption 1(i) block assignment only (see REGISTRY ## TROP Notes). Cross-coverage of the staggered-cohort fit path is `tests/test_methodology_trop.py::TestTROPAlgorithm1LOOCV::test_control_set_includes_pretreat_of_eventually_treated`. - [x] Algorithm 3 stratified pairs bootstrap: under an unbalanced (3 treated, 17 control) panel, the stratified sampler reliably produces ≥ 67% successful bootstrap draws and a positive finite SE. - [x] Section 3 / Eq. 6 semi-synthetic factor DGP: five recovery tests verify limiting-case uniform weights, unit-weight bias reduction, time-weight bias reduction, factor-model bias reduction with effective_rank > 0, and null-DGP recovery centred near zero. - [x] safe_inference contract: confidence interval uses the t-distribution with df = max(1, n_treated_obs - 1), consistent with p_value (matches REGISTRY `## TROP` "Inference CI distribution" note, post safe_inference migration). **Test Coverage:** -- 36 methodology tests (10 classes) in `tests/test_methodology_trop.py`. -- Defensive guards (107 tests in `tests/test_trop.py`): D-matrix absorbing-state validation, silent-warning audit, FISTA convergence warnings, bootstrap-failure-rate proportional warning, bootstrap NaN-SE propagation, module-split smoke tests. +- 39 methodology tests (10 classes) in `tests/test_methodology_trop.py` (includes non-absorbing opt-in recovery + caveat-warning + default-mode no-warning + unbalanced×non-absorbing). +- Defensive guards (117 tests in `tests/test_trop.py`): D-matrix absorbing-state validation, non-absorbing opt-in acceptance / local-only guard / params round-trip / Rust-Python parity, silent-warning audit, FISTA convergence warnings, bootstrap-failure-rate proportional warning, bootstrap NaN-SE propagation, module-split smoke tests. **Deviations from paper:** diff --git a/README.md b/README.md index d237b050..651f0edb 100644 --- a/README.md +++ b/README.md @@ -102,7 +102,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`. - [TwoWayFixedEffects](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - panel data DiD with unit and time fixed effects via within-transformation or dummies - [MultiPeriodDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - event study design with period-specific treatment effects for dynamic analysis - [CallawaySantAnna](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Callaway & Sant'Anna (2021) group-time ATT estimator for staggered adoption -- [ChaisemartinDHaultfoeuille](https://diff-diff.readthedocs.io/en/stable/api/chaisemartin_dhaultfoeuille.html) - de Chaisemartin & D'Haultfœuille (2020/2022) for **reversible (non-absorbing) treatments** with multi-horizon event study, normalized effects, cost-benefit delta, sup-t bands, and dynamic placebos. The only library option for treatments that switch on AND off. Alias `DCDH`. +- [ChaisemartinDHaultfoeuille](https://diff-diff.readthedocs.io/en/stable/api/chaisemartin_dhaultfoeuille.html) - de Chaisemartin & D'Haultfœuille (2020/2022) for **reversible (non-absorbing) treatments** with multi-horizon event study, normalized effects, cost-benefit delta, sup-t bands, and dynamic placebos. The most general option for treatments that switch on AND off (see also `LPDiD`/`TROP` `non_absorbing`). Alias `DCDH`. - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index 59d28ce8..1c675c69 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -1,9 +1,14 @@ """ de Chaisemartin-D'Haultfoeuille (dCDH) estimator for reversible-treatment DiD. -The dCDH estimator is the only modern DiD estimator in the diff-diff library -that handles **non-absorbing (reversible) treatments** — treatment can switch -on AND off over time. All other staggered estimators in the library +The dCDH estimator is the most general DiD estimator in the diff-diff library +for **non-absorbing (reversible) treatments** — treatment can switch on AND off +over time, switcher vs non-switcher comparisons are its primitive object, and it +allows dynamic (carryover) effects with explicit joiner/leaver (``DID_+`` / +``DID_-``) decomposition. ``LPDiD`` (``non_absorbing="first_entry"`` / +``"effect_stabilization"``) and ``TROP`` (``non_absorbing=True``, under a +no-dynamic-effects assumption) also accept non-absorbing treatment under stronger +assumptions. The remaining staggered estimators in the library (``CallawaySantAnna``, ``SunAbraham``, ``ImputationDiD``, ``TwoStageDiD``, ``EfficientDiD``, ``WooldridgeDiD``) assume treatment is absorbing. @@ -354,9 +359,11 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): """ de Chaisemartin-D'Haultfoeuille (dCDH) estimator. - The only modern DiD estimator in the library that handles **reversible - (non-absorbing) treatments** - treatment may switch on AND off over - time. Computes the contemporaneous-switch DiD ``DID_M`` from the + The most general library estimator for **reversible (non-absorbing) + treatments** - treatment may switch on AND off over time, with explicit + joiner/leaver (``DID_+`` / ``DID_-``) decomposition (``LPDiD`` and ``TROP`` + also support non-absorbing treatment under stronger assumptions; see their + ``non_absorbing`` parameters). Computes the contemporaneous-switch DiD ``DID_M`` from the AER 2020 paper (equivalently ``DID_1`` at horizon ``l = 1`` of the dynamic companion paper, NBER WP 29873) plus the full multi-horizon event study ``DID_l`` for ``l = 1..L_max`` via the ``L_max`` parameter diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py index eaccbda7..b9372f7d 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille_results.py +++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py @@ -4,9 +4,11 @@ This module contains ``ChaisemartinDHaultfoeuilleResults`` and ``DCDHBootstrapResults`` dataclasses produced by the ``ChaisemartinDHaultfoeuille`` (alias ``DCDH``) estimator. The dCDH -estimator is the only modern DiD estimator in the library that handles -non-absorbing (reversible) treatments. Phase 1 ships the contemporaneous- -switch case ``DID_M`` (= ``DID_1`` of the dynamic companion paper). +estimator is the most general library estimator for non-absorbing +(reversible) treatments (``LPDiD`` and ``TROP`` also support non-absorbing +treatment under stronger assumptions; see their ``non_absorbing`` parameters). +Phase 1 ships the contemporaneous-switch case ``DID_M`` (= ``DID_1`` of the +dynamic companion paper). References ---------- diff --git a/diff_diff/guides/llms-autonomous.txt b/diff_diff/guides/llms-autonomous.txt index 197dd827..54fb7d2e 100644 --- a/diff_diff/guides/llms-autonomous.txt +++ b/diff_diff/guides/llms-autonomous.txt @@ -531,12 +531,21 @@ When `has_never_treated == False`: When `treatment_type == "binary_non_absorbing"`: -- `ChaisemartinDHaultfoeuille` is the only estimator in the library - that treats this natively. Switcher / non-switcher comparisons are - its primitive object. -- Other estimators assume absorbing treatment and will produce - estimates whose interpretation is unclear. Do not use them without - a well-argued reason. +- `ChaisemartinDHaultfoeuille` is the most general / default choice and + treats this natively. Switcher / non-switcher comparisons are its + primitive object; it allows dynamic (carryover) effects and reports + joiner/leaver (`DID_+` / `DID_-`) views. Prefer it when effects may + persist after treatment turns off. +- `LPDiD(non_absorbing="first_entry")` or `"effect_stabilization"` + (entry-effect estimands) and `TROP(non_absorbing=True, method="local")` + (valid under a no-dynamic-effects / no-carryover assumption) also handle + non-absorbing treatment, under stronger assumptions. Use TROP's option + only when effects are contemporaneous (no carryover). +- The remaining estimators (`CallawaySantAnna`, `SunAbraham`, + `ImputationDiD`, `TwoStageDiD`, `EfficientDiD`, `WooldridgeDiD`) assume + absorbing treatment and will produce estimates whose interpretation is + unclear on non-absorbing data. Do not use them without a well-argued + reason. ### §4.6 Triple-difference design (DDD) diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 825f466e..6de24848 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -231,7 +231,7 @@ plot_event_study(results) ### ChaisemartinDHaultfoeuille -de Chaisemartin & D'Haultfœuille (2020/2022) estimator for **non-absorbing (reversible) treatments**. The only library estimator that handles treatments which can switch on AND off over time. Ships `DID_M` (= `DID_1` at horizon `l = 1`) plus the full multi-horizon event study `DID_l` for `l = 1..L_max` from the dynamic companion paper (NBER WP 29873). Includes normalized estimator `DID^n_l`, cost-benefit aggregate `delta`, dynamic placebos `DID^{pl}_l`, and sup-t simultaneous confidence bands. +de Chaisemartin & D'Haultfœuille (2020/2022) estimator for **non-absorbing (reversible) treatments**. The most general library estimator for treatments that switch on AND off over time (allows dynamic/carryover effects + joiner/leaver decomposition); `LPDiD` (`non_absorbing="first_entry"`/`"effect_stabilization"`) and `TROP` (`non_absorbing=True`, no-dynamic-effects) also handle non-absorbing treatment under stronger assumptions. Ships `DID_M` (= `DID_1` at horizon `l = 1`) plus the full multi-horizon event study `DID_l` for `l = 1..L_max` from the dynamic companion paper (NBER WP 29873). Includes normalized estimator `DID^n_l`, cost-benefit aggregate `delta`, dynamic placebos `DID^{pl}_l`, and sup-t simultaneous confidence bands. ```python ChaisemartinDHaultfoeuille( @@ -963,6 +963,7 @@ TROP( alpha: float = 0.05, n_bootstrap: int = 200, seed: int | None = None, + non_absorbing: bool = False, # False: require absorbing D (reject non-monotonic). True: allow on/off treatment (Eq. 12/Alg. 2), method='local' only; emits a caveat warning (Thm 5.1 is block-only). ) ``` @@ -972,7 +973,7 @@ TROP( trop.fit( data: pd.DataFrame, outcome: str, - treatment: str, # Absorbing-state treatment indicator (0/1). Must be 0 for all pre-treatment periods and 1 for treatment and post-treatment periods. + treatment: str, # Treatment indicator (0/1). Default (non_absorbing=False): absorbing state -- 0 for all pre-treatment periods, 1 for treatment and post-treatment; non-monotonic D raises ValueError. With non_absorbing=True: any on/off pattern (general assignment). unit: str, time: str, ) -> TROPResults diff --git a/diff_diff/guides/llms-practitioner.txt b/diff_diff/guides/llms-practitioner.txt index 274cbe04..2088f6c4 100644 --- a/diff_diff/guides/llms-practitioner.txt +++ b/diff_diff/guides/llms-practitioner.txt @@ -213,9 +213,12 @@ Is treatment adoption staggered (multiple cohorts, different timing)? | |-- Treatment switches ON and OFF (reversible / non-absorbing)? | \-- ChaisemartinDHaultfoeuille (dCDH / alias `DCDH`) -| -- Only library estimator for non-absorbing treatments; supports -| L_max multi-horizon, dynamic placebos, cost-benefit delta, -| HonestDiD, and `survey_design=` (pweight + strata/PSU/FPC via TSL) +| -- Most general option for non-absorbing treatments (allows dynamic +| effects + joiner/leaver views); supports L_max multi-horizon, +| dynamic placebos, cost-benefit delta, HonestDiD, and +| `survey_design=` (pweight + strata/PSU/FPC via TSL) +| -- Also: LPDiD(non_absorbing="first_entry"/"effect_stabilization") +| and TROP(non_absorbing=True, no-dynamic-effects) under stronger assumptions | |-- Few treated units (< 20)? | \-- SyntheticDiD (SDiD) -- synthetic control + DiD hybrid diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index 5d81d8f7..f61f5f3d 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -54,7 +54,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` - [TwoWayFixedEffects](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Panel data DiD with unit and time fixed effects via within-transformation or dummies - [MultiPeriodDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Event study design with period-specific treatment effects for dynamic analysis - [CallawaySantAnna](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Callaway & Sant'Anna (2021) group-time ATT estimator for staggered adoption with aggregation -- [ChaisemartinDHaultfoeuille](https://diff-diff.readthedocs.io/en/stable/api/chaisemartin_dhaultfoeuille.html): de Chaisemartin & D'Haultfœuille (2020/2022) estimator for **reversible (non-absorbing) treatments** with multi-horizon event study (`L_max`), normalized effects, cost-benefit delta, sup-t bands, and dynamic placebos. The only library option for treatments that switch on AND off. Alias `DCDH`. +- [ChaisemartinDHaultfoeuille](https://diff-diff.readthedocs.io/en/stable/api/chaisemartin_dhaultfoeuille.html): de Chaisemartin & D'Haultfœuille (2020/2022) estimator for **reversible (non-absorbing) treatments** with multi-horizon event study (`L_max`), normalized effects, cost-benefit delta, sup-t bands, and dynamic placebos. The most general option for treatments that switch on AND off (LPDiD/TROP `non_absorbing` also handle non-absorbing treatment under stronger assumptions). Alias `DCDH`. - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance @@ -66,7 +66,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` - [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html): de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲=0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲>0`, continuous-near-d̲ or mass-point), with multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release (repeated cross-sections rejected by the validator). Alias `HAD`. - [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html): Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments; optional covariate balancing (`balance="entropy"`, Ustyuzhanin 2026) - [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs -- [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html): Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment +- [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html): Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment (absorbing by default; `non_absorbing=True` for on/off treatment, method='local') - [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered.html#staggeredtripledifference): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT - [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — saturated OLS, logit/Poisson QMLE (ASF-based ATT). Alias: ETWFE - [LPDiD](https://diff-diff.readthedocs.io/en/stable/api/lpdid.html): Dube, Girardi, Jorda & Taylor (2025) Local Projections DiD: per-horizon long-difference event study on clean controls (no negative weighting); variance- or equally-weighted ATT, premean differencing, pooled pre/post, fast. Absorbing by default; non-absorbing (reversible) treatment via `non_absorbing="first_entry"` (Eq. 12) or `"effect_stabilization"` (Eq. 13, window `L`). Complex-survey designs (pweight + stratified-PSU TSL SEs) on the default path via `fit(survey_design=...)`. diff --git a/diff_diff/trop.py b/diff_diff/trop.py index 5c60dc15..467e8878 100644 --- a/diff_diff/trop.py +++ b/diff_diff/trop.py @@ -31,7 +31,11 @@ _rust_loocv_grid_search, ) from diff_diff.trop_global import TROPGlobalMixin -from diff_diff.trop_local import TROPLocalMixin, _setup_trop_data +from diff_diff.trop_local import ( + TROPLocalMixin, + _setup_trop_data, + _treated_cell_is_estimable, +) from diff_diff.trop_results import ( _LAMBDA_INF, _PrecomputedStructures, @@ -96,6 +100,28 @@ class TROP(TROPLocalMixin, TROPGlobalMixin): Number of bootstrap replications for variance estimation. Must be >= 2. seed : int, optional Random seed for reproducibility. + non_absorbing : bool, default=False + Treatment-assignment scope for the treatment indicator. + + - ``False`` (default): require an ABSORBING STATE indicator (once + treated, always treated). A non-monotonic indicator raises + ``ValueError``. This guards against the common mistake of encoding + absorbing treatment as an event-style spike (a single D=1 period), + which would silently bias the ATT. + - ``True``: accept general (on/off) assignment patterns, where treatment + may switch on and off, per Athey et al. (2025) Eq. 12 / Algorithm 2. + Supported for ``method='local'`` only (``method='global'`` raises). + Relies on the paper's no-dynamic-effects (no carryover) assumption; the + triple-robustness guarantee (Theorem 5.1) is proven only under block + assignment, so a ``UserWarning`` is emitted on fit. The estimand + averages the per-cell effects over the **estimable** treated (D=1) + cells (Eq. 1): a cell is non-estimable (NaN, excluded) when its unit/time + fixed effect ``alpha_i + beta_t`` is unidentified by the control fit -- + i.e. the target unit and target period are not in the same connected + component of the observed-control graph (an always-treated unit, a + fully-treated period, or disconnected control support). This matches the + library non-estimable->NaN convention (see REGISTRY ## TROP + "non-absorbing non-estimable-cell trimming"). Attributes ---------- @@ -134,6 +160,7 @@ def __init__( alpha: float = 0.05, n_bootstrap: int = 200, seed: Optional[int] = None, + non_absorbing: bool = False, ): # Validate method parameter valid_methods = ("local", "global") @@ -141,6 +168,14 @@ def __init__( raise ValueError(f"method must be one of {valid_methods}, got '{method}'") self.method = method + # Validate non_absorbing flag (must be a plain bool, not a truthy value). + # When False (default) TROP requires an absorbing-state treatment indicator; + # when True it accepts general (on/off) assignment patterns per Athey et al. + # (2025) Eq. 12 / Algorithm 2 -- local method only (see fit()). + if not isinstance(non_absorbing, bool): + raise ValueError(f"non_absorbing must be a bool, got {type(non_absorbing).__name__}") + self.non_absorbing = non_absorbing + # Default grids from paper self.lambda_time_grid = lambda_time_grid or [0.0, 0.1, 0.5, 1.0, 2.0, 5.0] self.lambda_unit_grid = lambda_unit_grid or [0.0, 0.1, 0.5, 1.0, 2.0, 5.0] @@ -389,17 +424,21 @@ def fit( treatment : str Name of the treatment indicator column (0/1). - IMPORTANT: This should be an ABSORBING STATE indicator, not a - treatment timing indicator. For each unit, D=1 for ALL periods - during and after treatment: + By default (``non_absorbing=False``) this must be an ABSORBING STATE + indicator, not a treatment timing indicator. For each unit, D=1 for + ALL periods during and after treatment: - D[t, i] = 0 for all t < g_i (pre-treatment periods) - D[t, i] = 1 for all t >= g_i (treatment and post-treatment) where g_i is the treatment start time for unit i. - For staggered adoption, different units can have different g_i. - The ATT averages over ALL D=1 cells per Equation 1 of the paper. + For staggered adoption, different units can have different g_i (this + is still absorbing). Set ``non_absorbing=True`` to allow treatment to + switch on and off (general assignment, ``method='local'`` only). The + ATT averages over the **estimable** D=1 cells per Equation 1 (a cell + whose unit/time fixed effect is unidentified by the control fit is + NaN and excluded; see ``non_absorbing`` and ``TROPResults``). unit : str Name of the unit identifier column. time : str @@ -470,8 +509,34 @@ def fit( # Below is the local method (default) _ctx = _setup_trop_data( - data, outcome, treatment, unit, time, resolved_survey, survey_design + data, + outcome, + treatment, + unit, + time, + resolved_survey, + survey_design, + non_absorbing=self.non_absorbing, ) + + # Non-absorbing (general assignment) is a paper-supported point estimator + # (Athey et al. 2025 Eq. 12 / Algorithm 2) but the formal triple-robustness + # guarantee (Theorem 5.1) is proven only under block assignment, and the + # bootstrap's validity (Algorithm 3) requires a growing number of treated + # units. Surface that caveat once per fit so users do not over-read the SE. + if self.non_absorbing: + warnings.warn( + "TROP(non_absorbing=True): treating the panel as a general " + "(on/off) assignment pattern per Athey et al. (2025) Eq. 12 / " + "Algorithm 2. This relies on the no-dynamic-effects (no carryover) " + "assumption. The triple-robustness guarantee (Theorem 5.1) is " + "proven only under block assignment, and bootstrap-SE validity " + "requires a growing number of treated units -- interpret standard " + "errors with care.", + UserWarning, + stacklevel=2, + ) + n_units = _ctx["n_units"] n_periods = _ctx["n_periods"] idx_to_unit = _ctx["idx_to_unit"] @@ -479,6 +544,7 @@ def fit( unit_weight_arr = _ctx["unit_weight_arr"] Y = _ctx["Y"] D = _ctx["D"] + missing_mask = _ctx["missing_mask"] n_treated_obs = _ctx["n_treated_obs"] treated_unit_idx = _ctx["treated_unit_idx"] control_unit_idx = _ctx["control_unit_idx"] @@ -676,6 +742,7 @@ def fit( treated_observations = self._precomputed["treated_observations"] nonconverg_tracker: list = [] n_fits_attempted = 0 + n_no_support = 0 for t, i in treated_observations: unit_id = idx_to_unit[i] @@ -692,6 +759,25 @@ def fit( Y, D, i, t, lambda_time, lambda_unit, control_unit_idx, n_units, n_periods ) + # Guard against a treated cell with no positively-weighted, observed + # control support. Under non_absorbing with lambda_unit>0, a unit that + # is never observed untreated has inf distance to every donor, so all + # unit weights collapse to 0; the model then fits nothing and tau would + # silently equal the raw outcome Y_it. Mark such cells non-estimable + # (NaN) -- consistent with the missing-outcome NaN convention above -- + # rather than report a wrong effect. The cell-specific check also + # covers lambda_unit=0 (uniform weights still leave an always-treated + # unit's alpha_i unidentified) and a fully-treated period (beta_t + # unidentified). It is a general correctness guard applied to every + # local fit: a no-op when each treated cell's unit and period have an + # observed control cell (always so on balanced panels, and in + # absorbing mode unless an unbalanced panel leaves a unit's pre-period + # controls or a period's controls entirely missing). + if not _treated_cell_is_estimable(control_mask, Y, weight_matrix, i, t): + treatment_effects[(unit_id, time_id)] = np.nan + n_no_support += 1 + continue + # Fit model with these weights n_fits_attempted += 1 alpha_hat, beta_hat, L_hat = self._estimate_model( @@ -717,6 +803,19 @@ def fit( beta_estimates.append(beta_hat) L_estimates.append(L_hat) + if n_no_support > 0: + warnings.warn( + f"{n_no_support} of {n_treated_obs} treated cell(s) are not " + f"estimable: the target unit and target period are not connected " + f"in the observed-control graph, so the cell's unit/time fixed " + f"effect (alpha_i + beta_t) is unidentified (e.g. an always-treated " + f"unit, a period in which every unit is treated, or disconnected " + f"control support). Their treatment effects are NaN and are " + f"excluded from the ATT.", + UserWarning, + stacklevel=2, + ) + if nonconverg_tracker: warn_if_not_converged( False, @@ -727,25 +826,42 @@ def fit( self.tol, ) - # Count valid treated observations + # Count valid (estimable) treated observations. A cell is excluded when + # its outcome is NaN/missing or it has no weighted control support (the + # latter is additionally surfaced by the no-support warning above). n_valid_treated = len(tau_values) if n_valid_treated == 0: - warnings.warn( - "All treated outcomes are NaN/missing. Cannot estimate ATT.", - UserWarning, - ) + if n_no_support > 0: + warnings.warn( + "No treated cells were estimable (for every treated cell the " + "target unit and target period are not connected in the " + "observed-control graph, leaving alpha_i + beta_t " + "unidentified). Cannot estimate ATT.", + UserWarning, + ) + else: + warnings.warn( + "All treated outcomes are NaN/missing. Cannot estimate ATT.", + UserWarning, + ) elif n_valid_treated < n_treated_obs: warnings.warn( - f"Only {n_valid_treated} of {n_treated_obs} treated outcomes are finite. " - "df and n_treated_obs reflect valid observations only.", + f"Only {n_valid_treated} of {n_treated_obs} treated cells were " + "estimable (finite outcome with weighted control support). " + "df and n_treated_obs reflect estimable observations only.", UserWarning, ) - # Average ATT (survey-weighted when applicable) - if unit_weight_arr is not None and tau_values: + # Average ATT (survey-weighted when applicable). Guard the weighted path + # against a zero total weight (e.g. the only estimable treated cells all + # carry zero survey weight after non-estimable cells are excluded), which + # would make np.average raise; fall back to NaN per the inference contract. + if unit_weight_arr is not None and tau_values and float(np.sum(tau_weights)) > 0.0: att = float(np.average(tau_values, weights=tau_weights)) + elif tau_values and unit_weight_arr is None: + att = float(np.mean(tau_values)) else: - att = np.mean(tau_values) if tau_values else np.nan + att = np.nan # Average parameter estimates for output (representative) alpha_hat = np.mean(alpha_estimates, axis=0) if alpha_estimates else np.zeros(n_units) @@ -775,6 +891,15 @@ def fit( survey_design=survey_design, unit_weight_arr=unit_weight_arr, resolved_survey=resolved_survey, + # Force the guarded Python bootstrap (the Rust per-cell tau path lacks + # the estimability guard) whenever a resample could need it: (a) the + # point fit already trimmed a cell (n_no_support>0); or (b) the panel + # is unbalanced (has missing cells) -- a bootstrap resample can then + # lose a cell's only control support even if the original fit was + # fully estimable, and Rust would contaminate that draw's SE. Balanced + # panels keep the Rust happy path: the stratified resample always + # re-draws the control stratum, so support is preserved. + force_python=bool(n_no_support > 0 or np.any(missing_mask)), ) # Compute test statistics @@ -811,6 +936,7 @@ def fit( n_bootstrap=self.n_bootstrap, bootstrap_distribution=bootstrap_dist if len(bootstrap_dist) > 0 else None, survey_metadata=survey_metadata, + non_absorbing=self.non_absorbing, ) self.is_fitted_ = True @@ -832,6 +958,7 @@ def get_params(self) -> Dict[str, Any]: "alpha": self.alpha, "n_bootstrap": self.n_bootstrap, "seed": self.seed, + "non_absorbing": self.non_absorbing, } def set_params(self, **params) -> "TROP": @@ -839,6 +966,8 @@ def set_params(self, **params) -> "TROP": for key, value in params.items(): if key == "method" and value not in ("local", "global"): raise ValueError(f"method must be one of ('local', 'global'), got '{value}'") + if key == "non_absorbing" and not isinstance(value, bool): + raise ValueError(f"non_absorbing must be a bool, got {type(value).__name__}") if hasattr(self, key): setattr(self, key, value) else: @@ -867,10 +996,12 @@ def trop( treatment : str Treatment indicator column name (0/1). - IMPORTANT: This should be an ABSORBING STATE indicator, not a treatment - timing indicator. For each unit, D=1 for ALL periods during and after - treatment (D[t,i]=0 for t < g_i, D[t,i]=1 for t >= g_i where g_i is - the treatment start time for unit i). + By default (``non_absorbing=False``) this must be an ABSORBING STATE + indicator, not a treatment timing indicator: for each unit, D=1 for ALL + periods during and after treatment (D[t,i]=0 for t < g_i, D[t,i]=1 for + t >= g_i where g_i is the treatment start time for unit i). Pass + ``non_absorbing=True`` (via ``**kwargs``) to accept general on/off + assignment patterns (``method='local'`` only); see ``TROP``. unit : str Unit identifier column name. time : str @@ -878,7 +1009,7 @@ def trop( survey_design : SurveyDesign, optional Survey design specification. Supports pweight, strata, PSU, and FPC. **kwargs - Additional arguments passed to TROP constructor. + Additional arguments passed to TROP constructor (e.g. ``non_absorbing``). Returns ------- diff --git a/diff_diff/trop_global.py b/diff_diff/trop_global.py index 4b8b4182..bc2d8d5f 100644 --- a/diff_diff/trop_global.py +++ b/diff_diff/trop_global.py @@ -588,7 +588,20 @@ def _fit_global( across units, use `method="local"` which computes observation-specific weights that naturally handle heterogeneous timing. """ - # Data setup (shared with local method via _setup_trop_data helper). + # The global method's post-hoc weighting and bootstrap bake in a + # contiguous, simultaneous treated block (see Notes above), which is + # incompatible with general on/off assignment. Non-absorbing support is + # local-method only (Athey et al. 2025 Eq. 12 / Algorithm 2). + if getattr(self, "non_absorbing", False): + raise ValueError( + "non_absorbing=True requires method='local'; the global method " + "requires block (simultaneous) treatment assignment. Use " + "TROP(method='local', non_absorbing=True) for on/off treatment." + ) + + # Data setup (shared with local method via _setup_trop_data helper). The + # global path always validates absorbing-state (non_absorbing=False); it + # additionally requires simultaneous block adoption (checked below). _ctx = _setup_trop_data( data, outcome, treatment, unit, time, resolved_survey, survey_design ) @@ -835,6 +848,9 @@ def _fit_global( n_bootstrap=self.n_bootstrap, bootstrap_distribution=bootstrap_dist if len(bootstrap_dist) > 0 else None, survey_metadata=survey_metadata, + # Global method requires block assignment (non_absorbing=True is + # rejected at the top of _fit_global), so this is always absorbing. + non_absorbing=False, ) self.is_fitted_ = True diff --git a/diff_diff/trop_local.py b/diff_diff/trop_local.py index 04c13f8e..263401db 100644 --- a/diff_diff/trop_local.py +++ b/diff_diff/trop_local.py @@ -32,6 +32,72 @@ from diff_diff.utils import warn_if_not_converged +def _treated_cell_is_estimable( + control_mask: np.ndarray, + Y: np.ndarray, + weight_matrix: np.ndarray, + i: int, + t: int, +) -> bool: + """True iff treated cell (i, t)'s counterfactual is identified by the control fit. + + The working model fits unregularized unit and time fixed effects + ``alpha_j`` / ``beta_s`` on the weighted observed control cells, then sets + ``tau_it = Y_it - alpha_i - beta_t - L_it``. For that difference to be a valid + counterfactual rather than a fixed-effect-contaminated raw outcome, the sum + ``alpha_i + beta_t`` must be identified by the two-way-FE control fit. + + In a two-way fixed-effect model the effects are pinned only **within each + connected component** of the bipartite graph whose nodes are units and + periods and whose edges are the positively-weighted observed control cells + (``usable = (D==0) & finite(Y) & weight>0``); across components there is a + free per-component offset. Hence ``alpha_i + beta_t`` is identified iff the + **target unit node and target period node lie in the same component**. + + A marginal "the target unit has some usable control AND the target period has + some usable control" test is necessary but NOT sufficient: e.g. usable cells + at ``(unitA, t0)`` and ``(unitB, t1)`` with target ``(unitA, t1)`` pass it, + yet ``alpha_A + beta_1`` spans two disconnected components and is unidentified. + This connected-component check subsumes the simpler degeneracies it replaced: + an always-treated unit (empty unit column) or a fully-treated period (empty + period row) leaves the corresponding node isolated, hence non-estimable. + + This is a **general correctness guard applied to every local fit** (absorbing + and non-absorbing): it NaNs exactly the cells whose ``alpha_i + beta_t`` is + unidentified. On balanced panels (and absorbing panels with an observed + never-treated unit, which connects every period to every unit) the whole + control graph is one component, so the predicate is a no-op (no behavior + change). Shared by the final point fit and the bootstrap fixed-lambda refit. + + Cost: a bipartite BFS bounded by the usable-cell count, run per treated cell + only when both the unit column and period row are non-empty (the cheap + fast-path rejects the common degeneracies first). non_absorbing is opt-in and + correctness-first, so the extra work is acceptable. + """ + usable = control_mask & np.isfinite(Y) & (weight_matrix > 0) + # Fast path: an empty target column (alpha_i) or row (beta_t) is isolated. + if not bool(np.any(usable[:, i])) or not bool(np.any(usable[t, :])): + return False + # Bipartite reachability from period-node t; estimable iff unit-node i reached. + reached_periods = np.zeros(usable.shape[0], dtype=bool) + reached_units = np.zeros(usable.shape[1], dtype=bool) + reached_periods[t] = True + while True: + # Units adjacent to any reached period, then periods adjacent to any + # reached unit; iterate the bipartite expansion to a fixpoint. + new_units = reached_units | np.any(usable[reached_periods, :], axis=0) + if np.any(new_units): + new_periods = reached_periods | np.any(usable[:, new_units], axis=1) + else: + new_periods = reached_periods + if np.array_equal(new_units, reached_units) and np.array_equal( + new_periods, reached_periods + ): + break + reached_units, reached_periods = new_units, new_periods + return bool(reached_units[i]) + + def _validate_and_pivot_treatment(data, time, unit, treatment, all_periods, all_units): """Validate treatment column and create D matrix with missing mask. @@ -71,7 +137,16 @@ def _validate_and_pivot_treatment(data, time, unit, treatment, all_periods, all_ return D, missing_mask -def _setup_trop_data(data, outcome, treatment, unit, time, resolved_survey, survey_design): +def _setup_trop_data( + data, + outcome, + treatment, + unit, + time, + resolved_survey, + survey_design, + non_absorbing: bool = False, +): """Shared data setup for TROP local and global fit paths. Performs panel pivoting (long → wide), absorbing-state validation, @@ -79,6 +154,18 @@ def _setup_trop_data(data, outcome, treatment, unit, time, resolved_survey, surv and pre/post period counting. Returns a dict so both callers can unpack only the fields they need. + When ``non_absorbing`` is False (default) the treatment indicator must be an + absorbing state (monotonic non-decreasing per unit) and a non-monotonic + indicator raises ``ValueError``; there must be at least one never-treated + unit and at least 2 leading pre-treatment periods. When ``non_absorbing`` is + True these absorbing-specific guards are relaxed to support general (on/off) + assignment patterns (Athey et al. 2025 Eq. 12 / Algorithm 2): the + monotonicity check is skipped, identification falls back to untreated *cells* + (rather than requiring whole never-treated units), and the pre-period guard + becomes a weaker "at least 2 periods contain untreated cells" check. The + global fit path always calls with ``non_absorbing=False`` (it additionally + requires simultaneous block adoption). + The global-method-specific staggered-adoption check stays in `_fit_global` as a post-helper validation because it depends on estimator semantics (global method requires simultaneous treatment), @@ -128,20 +215,25 @@ def _setup_trop_data(data, outcome, treatment, unit, time, resolved_survey, surv data, time, unit, treatment, all_periods, all_units ) - violating_units = [] - for unit_idx in range(n_units): - observed_mask = ~missing_mask[:, unit_idx] - observed_d = D[observed_mask, unit_idx] - if len(observed_d) > 1 and np.any(np.diff(observed_d) < 0): - violating_units.append(all_units[unit_idx]) - - if violating_units: - raise ValueError( - f"Treatment indicator is not an absorbing state for units: {violating_units}. " - f"D[t, unit] must be monotonic non-decreasing (once treated, always treated). " - f"If this is event-study style data, convert to absorbing state: " - f"D[t, i] = 1 for all t >= first treatment period." - ) + # Absorbing-state (monotonic non-decreasing) validation. Skipped when the + # caller opts into general (on/off) assignment via non_absorbing=True. + if not non_absorbing: + violating_units = [] + for unit_idx in range(n_units): + observed_mask = ~missing_mask[:, unit_idx] + observed_d = D[observed_mask, unit_idx] + if len(observed_d) > 1 and np.any(np.diff(observed_d) < 0): + violating_units.append(all_units[unit_idx]) + + if violating_units: + raise ValueError( + f"Treatment indicator is not an absorbing state for units: {violating_units}. " + f"D[t, unit] must be monotonic non-decreasing (once treated, always treated). " + f"If this is event-study style data with absorbing treatment, convert to " + f"absorbing state: D[t, i] = 1 for all t >= first treatment period. " + f"If treatment genuinely turns on and off (non-absorbing), pass " + f"non_absorbing=True (method='local' only; assumes no dynamic effects)." + ) treated_mask = D == 1 n_treated_obs = int(np.sum(treated_mask)) @@ -153,8 +245,28 @@ def _setup_trop_data(data, outcome, treatment, unit, time, resolved_survey, surv treated_unit_idx = np.where(unit_ever_treated)[0] control_unit_idx = np.where(~unit_ever_treated)[0] - if len(control_unit_idx) == 0: - raise ValueError("No control units found") + # Observed untreated cells. Structural panel gaps are filled with D=0 + # (_validate_and_pivot_treatment), so identification checks under + # non_absorbing must exclude those filled cells (and non-finite outcomes): + # only an OBSERVED D=0 cell can serve as a control for the (1-W) + # counterfactual fit. A raw `D == 0` count would let an all-observed-treated + # unbalanced panel pass with no real control outcomes. + valid_control_mask = (D == 0) & (~missing_mask) & np.isfinite(Y) + + if non_absorbing: + # General assignment identifies off untreated *cells* (the per-(i,t) + # estimator masks treated cells via (1-W) and fits the rest), so a fully + # toggling panel with no never-treated unit is still identified. Require + # at least one observed untreated cell. + if not np.any(valid_control_mask): + raise ValueError( + "No observed untreated (control) observations found; non_absorbing " + "TROP needs observed cells with D=0 (not structural panel gaps) to " + "impute the counterfactual." + ) + else: + if len(control_unit_idx) == 0: + raise ValueError("No control units found") first_treat_period = None for t in range(n_periods): @@ -168,7 +280,20 @@ def _setup_trop_data(data, outcome, treatment, unit, time, resolved_survey, surv n_pre_periods = first_treat_period n_post_periods = int(np.sum(np.any(D[first_treat_period:, :] == 1, axis=1))) - if n_pre_periods < 2: + if non_absorbing: + # "Leading all-control block" is ill-defined when treatment toggles, so + # the absorbing n_pre_periods>=2 guard does not apply. Require instead + # that at least 2 periods contain an OBSERVED untreated cell (a weak + # factor-model identifiability floor); finer donor-pool degeneracy is + # handled downstream by the LOOCV empty-control (Q=inf) and inf-distance + # guards. + n_periods_with_controls = int(np.sum(np.any(valid_control_mask, axis=1))) + if n_periods_with_controls < 2: + raise ValueError( + "Need at least 2 periods containing observed untreated " + "observations for non_absorbing TROP." + ) + elif n_pre_periods < 2: raise ValueError("Need at least 2 pre-treatment periods") return { @@ -1031,10 +1156,17 @@ def _bootstrap_variance( survey_design=None, unit_weight_arr: Optional[np.ndarray] = None, resolved_survey=None, + force_python: bool = False, ) -> Tuple[float, np.ndarray]: """ Compute bootstrap standard error using unit-level block bootstrap. + ``force_python=True`` skips the Rust happy path so the cell-specific + estimability guard in ``_fit_with_fixed_lambda`` is applied per draw. The + point fit sets this whenever it trimmed any non-estimable treated cell + (the Rust per-cell tau path lacks the guard), keeping the bootstrap SE + and the point ATT on the same estimable-cell set. + When the optional Rust backend is available and the matrix parameters (Y, D, control_unit_idx) are provided, uses parallelized Rust implementation for 5-15x speedup. Falls back to Python implementation @@ -1132,13 +1264,25 @@ def _bootstrap_variance( ) # Try Rust backend for parallel bootstrap (5-15x speedup) - # Only used for pweight-only designs (no strata/PSU/FPC) + # Only used for pweight-only designs (no strata/PSU/FPC). + # Routed to the Python loop when the cell-specific estimability contract + # could diverge from Rust: (a) non_absorbing fits -- a fully non-absorbing + # panel can have zero never-treated units (empty control stratum -> Rust + # can return a degenerate ~0 SE), and the Rust per-cell tau path lacks the + # estimability guard; (b) force_python -- the point fit trimmed at least + # one non-estimable treated cell (e.g. an unbalanced absorbing panel), so + # the Rust path (no guard) would compute SE over a different cell set than + # the point ATT. The Python `_fit_with_fixed_lambda` enforces the guard + # per draw in both cases. if ( HAS_RUST_BACKEND and _rust_bootstrap_trop_variance is not None and self._precomputed is not None and Y is not None and D is not None + and n_control_units > 0 + and not getattr(self, "non_absorbing", False) + and not force_python ): try: control_mask = self._precomputed["control_mask"] @@ -1479,6 +1623,14 @@ def _fit_with_fixed_lambda( Y, D, i, t, lambda_time, lambda_unit, control_unit_idx, n_units, n_periods ) + # Skip non-estimable cells (same predicate as the main fit): if the + # target unit or target period has no weighted observed control cell, + # alpha_i / beta_t are unidentified and tau leaks the fixed effect, + # silently biasing the draw's ATT. A draw with no estimable cell + # returns NaN and is counted as a failed replicate. + if not _treated_cell_is_estimable(control_mask, Y, weight_matrix, i, t): + continue + # Fit model with these weights alpha, beta, L = self._estimate_model( Y, @@ -1499,5 +1651,13 @@ def _fit_with_fixed_lambda( if not tau_values: return float("nan") if local_weight_arr is not None: + # Guard against a degenerate weighted draw: after non-estimable cells + # are skipped, the remaining estimable cells can all carry zero + # (rescaled survey / Rao-Wu) weight, which would make np.average raise + # ZeroDivisionError. Treat such a draw as failed (NaN) per the + # bootstrap NaN-on-degenerate contract. + weight_sum = float(np.sum(tau_weights)) + if not np.isfinite(weight_sum) or weight_sum <= 0.0: + return float("nan") return float(np.average(tau_values, weights=tau_weights)) return float(np.mean(tau_values)) diff --git a/diff_diff/trop_results.py b/diff_diff/trop_results.py index 45e54bd1..d95f09ce 100644 --- a/diff_diff/trop_results.py +++ b/diff_diff/trop_results.py @@ -96,7 +96,15 @@ class TROPResults: time_effects : dict Estimated time fixed effects (beta_t). treatment_effects : dict - Individual treatment effects for each treated (unit, time) pair. + Individual treatment effects for each treated (unit, time) pair. The + value is NaN for a cell that is not estimable -- a missing outcome, or a + cell whose unit/time fixed effect ``alpha_i + beta_t`` is unidentified by + the control fit (the target unit and target period are not in the same + connected component of the observed-control graph: an always-treated unit, + a fully-treated period, or disconnected control support). This applies to + all local TROP fits; it is reachable mainly under ``non_absorbing=True`` + but also on unbalanced absorbing panels. The reported ATT is the mean over + the finite (estimable) cells. lambda_time : float Selected time weight decay parameter from grid. 0.0 = uniform time weights (disabled) per Eq. 3. @@ -122,6 +130,12 @@ class TROPResults: Number of bootstrap replications (if bootstrap variance). bootstrap_distribution : np.ndarray, optional Bootstrap distribution of estimates. + non_absorbing : bool, default=False + Treatment-assignment scope used for the fit. False = absorbing-state + treatment (default); True = general on/off assignment (``method='local'`` + only). Recorded so a persisted result retains the assignment-scope and + inference-caveat context (Theorem 5.1 is block-only) after the fit-time + ``UserWarning`` is gone. """ att: float @@ -149,6 +163,11 @@ class TROPResults: bootstrap_distribution: Optional[np.ndarray] = field(default=None, repr=False) # Survey design metadata (SurveyMetadata instance from diff_diff.survey) survey_metadata: Optional[Any] = field(default=None) + # Treatment-assignment scope used for the fit: False = absorbing (default), + # True = general on/off assignment (method='local'; Athey et al. 2025 Eq. 12). + # Recorded so a persisted result retains the assignment-scope / inference + # caveat context after the fit-time UserWarning is gone. + non_absorbing: bool = False def __repr__(self) -> str: """Concise string representation.""" @@ -195,10 +214,21 @@ def summary(self, alpha: Optional[float] = None) -> str: "", f"{'Observations:':<25} {self.n_obs:>10}", f"{'Treated units:':<25} {self.n_treated:>10}", - f"{'Control units:':<25} {self.n_control:>10}", + # Under non-absorbing assignment a unit can be treated in some periods + # and untreated in others, so n_control (never-treated units) may be 0 + # even though many untreated control *cells* exist; label accordingly. + ( + f"{'Never-treated units:':<25} {self.n_control:>10}" + if self.non_absorbing + else f"{'Control units:':<25} {self.n_control:>10}" + ), f"{'Treated observations:':<25} {self.n_treated_obs:>10}", f"{'Pre-treatment periods:':<25} {self.n_pre_periods:>10}", f"{'Post-treatment periods:':<25} {self.n_post_periods:>10}", + ] + if self.non_absorbing: + lines.append(f"{'Assignment scope:':<25} {'non-absorbing (on/off)':>20}") + lines += [ "", "-" * 75, "Tuning Parameters (selected via LOOCV)".center(75), @@ -280,6 +310,7 @@ def to_dict(self) -> Dict[str, Any]: "lambda_nn": self.lambda_nn, "effective_rank": self.effective_rank, "loocv_score": self.loocv_score, + "non_absorbing": self.non_absorbing, } if self.survey_metadata is not None: sm = self.survey_metadata diff --git a/docs/api/chaisemartin_dhaultfoeuille.rst b/docs/api/chaisemartin_dhaultfoeuille.rst index a28e128a..1ee5634d 100644 --- a/docs/api/chaisemartin_dhaultfoeuille.rst +++ b/docs/api/chaisemartin_dhaultfoeuille.rst @@ -1,9 +1,11 @@ de Chaisemartin-D'Haultfœuille (dCDH) DiD ============================================ -The only modern staggered DiD estimator in diff-diff that handles -**non-absorbing (reversible) treatments** — treatment may switch on AND -off over time. +The most general estimator in diff-diff for **non-absorbing (reversible) +treatments** — treatment may switch on AND off over time, with explicit +joiner/leaver decomposition and multi-horizon dynamics. (:class:`~diff_diff.LPDiD` +and :class:`~diff_diff.TROP` also support non-absorbing treatment under stronger +assumptions; see their ``non_absorbing`` parameters.) This module implements the methodology from de Chaisemartin & D'Haultfœuille (2020/2022). The estimator ships the contemporaneous-switch path ``DID_M`` @@ -79,12 +81,15 @@ The estimator: ``l = 1`` on cell-aggregated input (see REGISTRY.md for documented deviations on individual-level inputs with uneven cell sizes) -All other staggered estimators in diff-diff (:class:`~diff_diff.CallawaySantAnna`, +The remaining staggered estimators in diff-diff (:class:`~diff_diff.CallawaySantAnna`, :class:`~diff_diff.SunAbraham`, :class:`~diff_diff.ImputationDiD`, :class:`~diff_diff.TwoStageDiD`, :class:`~diff_diff.EfficientDiD`, :class:`~diff_diff.WooldridgeDiD`) assume treatment is **absorbing** — -once treated, stays treated. ``ChaisemartinDHaultfoeuille`` is the only -library option for non-absorbing treatments. +once treated, stays treated. ``ChaisemartinDHaultfoeuille`` is the most general +option for non-absorbing treatments; :class:`~diff_diff.LPDiD` +(``non_absorbing="first_entry"`` / ``"effect_stabilization"``) and +:class:`~diff_diff.TROP` (``non_absorbing=True``, under a no-dynamic-effects +assumption) also support non-absorbing treatment. **Panel requirements (deviation from R DIDmultiplegtDYN):** diff --git a/docs/api/trop.rst b/docs/api/trop.rst index 743992a3..0d66e37e 100644 --- a/docs/api/trop.rst +++ b/docs/api/trop.rst @@ -168,6 +168,36 @@ Treatment effects are **heterogeneous** per-observation residuals; ATT is their Use ``method='local'`` for observation-specific weight optimization. Use ``method='global'`` for faster estimation with global weights. +Non-absorbing (on/off) treatment +-------------------------------- + +By default TROP requires an **absorbing-state** treatment indicator (once treated, +always treated) and rejects a non-monotonic indicator with a ``ValueError``. This +guards against the common mistake of encoding absorbing treatment as an event-style +spike (a single ``D=1`` period), which would silently bias the ATT. + +The paper, however, supports **general assignment patterns** including treatment that +switches on and off (§2.1: "units moving into and out of treatment"; Eq. 12 / +Algorithm 2). Enable this with the opt-in ``non_absorbing=True`` (``method='local'`` +only):: + + from diff_diff import TROP + + trop = TROP(method='local', non_absorbing=True) + results = trop.fit(data, outcome='y', treatment='treated', + unit='unit_id', time='period') + +Caveats (a ``UserWarning`` is emitted on fit): + +- Validity relies on the paper's **no-spillover / no-dynamic-effects (no carryover)** + assumption. +- The point estimator (Eq. 12) is general, but the formal **triple-robustness + guarantee (Theorem 5.1) is proven only under block assignment**; the bootstrap is + offered generally but its validity requires a growing number of treated units, so + interpret standard errors with care. +- ``non_absorbing=True`` is supported for ``method='local'`` only; + ``TROP(method='global', non_absorbing=True)`` raises a ``ValueError``. + Example Usage ------------- @@ -184,8 +214,11 @@ Basic usage:: ) # Note: TROP infers treatment periods from the treatment indicator column. - # The treatment column should be an absorbing state (D=1 for all periods - # during and after treatment starts). + # By default the treatment column must be an absorbing state (D=1 for all + # periods during and after treatment starts); a non-monotonic indicator + # raises ValueError. For treatment that genuinely switches on and off, + # pass non_absorbing=True (method='local' only) -- see "Non-absorbing + # (on/off) treatment" below. results = trop.fit( data, outcome='y', diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index 69f437b3..c3edcc31 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -26,7 +26,7 @@ Start here and follow the questions: 2. **Can treatment switch on AND off?** (Reversible / non-absorbing treatment — e.g., marketing campaigns, seasonal promotions, on/off policy cycles) - **No (treatment is absorbing — once treated, stays treated)** → Go to question 3 - - **Yes** → Use :class:`~diff_diff.ChaisemartinDHaultfoeuille` — the only library estimator that handles non-absorbing treatments + - **Yes** → Use :class:`~diff_diff.ChaisemartinDHaultfoeuille` — the most general option (allows dynamic/carryover effects, with joiner/leaver views). :class:`~diff_diff.LPDiD` (``non_absorbing="first_entry"`` / ``"effect_stabilization"``) and :class:`~diff_diff.TROP` (``non_absorbing=True``, under a no-dynamic-effects assumption) also handle non-absorbing treatment under stronger assumptions 3. **Is treatment staggered?** (Different units treated at different times) @@ -78,7 +78,7 @@ Quick Reference - Conditional parallel trends - Group-time ATT(g,t), aggregations * - ``ChaisemartinDHaultfoeuille`` - - Reversible / non-absorbing treatments (only library option) + - Reversible / non-absorbing treatments (most general; allows dynamic effects) - Parallel trends + A5 (no crossing) + A11 (stable controls) - DID_l event study (L_max), normalized DID^n_l, cost-benefit delta, placebos, sup-t bands, TWFE diagnostic * - ``SyntheticDiD`` @@ -250,8 +250,13 @@ Use :class:`~diff_diff.ChaisemartinDHaultfoeuille` (alias :class:`~diff_diff.DCD normalized effects, cost-benefit aggregation, dynamic placebos, and sup-t simultaneous confidence bands -This is **the only library estimator that handles non-absorbing treatments**. -All other staggered estimators +This is the **most general** library estimator for non-absorbing treatment: it +allows dynamic (carryover) effects and reports separate joiner/leaver views. +Two other estimators also accept non-absorbing treatment under stronger +assumptions: :class:`~diff_diff.LPDiD` (``non_absorbing="first_entry"`` / +``"effect_stabilization"`` — entry-effect estimands) and :class:`~diff_diff.TROP` +(``non_absorbing=True``, ``method='local'`` — valid under the paper's +no-dynamic-effects / no-carryover assumption). The remaining staggered estimators (:class:`~diff_diff.CallawaySantAnna`, :class:`~diff_diff.SunAbraham`, :class:`~diff_diff.ImputationDiD`, :class:`~diff_diff.TwoStageDiD`, :class:`~diff_diff.EfficientDiD`, :class:`~diff_diff.WooldridgeDiD`) assume diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index da3f06c8..378b3aa2 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -648,7 +648,7 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1: - [de Chaisemartin, C. & D'Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. *American Economic Review*, 110(9), 2964-2996.](https://doi.org/10.1257/aer.20181169) - [de Chaisemartin, C. & D'Haultfœuille, X. (2022, revised July 2023). Difference-in-Differences Estimators of Intertemporal Treatment Effects. NBER Working Paper 29873.](https://www.nber.org/papers/w29873) — Web Appendix Section 3.7.3 contains the cohort-recentered plug-in variance formula implemented here. -**Phase 1-2 scope:** Ships the contemporaneous-switch estimator `DID_M` (= `DID_1` at horizon `l = 1`) from the AER 2020 paper **plus** the full multi-horizon event study `DID_l` for `l = 1..L_max` from the dynamic companion paper. Phase 2 adds: per-group `DID_{g,l}` building block (Equation 3), dynamic placebos `DID^{pl}_l`, normalized estimator `DID^n_l`, cost-benefit aggregate `delta`, sup-t simultaneous confidence bands, and `plot_event_study()` integration. Phase 3 adds covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, and HonestDiD integration. Survey design supports pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) or replicate-weight variance (BRR/Fay/JK1/JKn/SDR) across all IF sites, plus opt-in PSU-level Hall-Mammen wild bootstrap via `n_bootstrap > 0` (see the full checklist + Notes below for the contract). **This is the only modern staggered estimator in the library that handles non-absorbing (reversible) treatments** - treatment can switch on AND off over time, making it the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles. +**Phase 1-2 scope:** Ships the contemporaneous-switch estimator `DID_M` (= `DID_1` at horizon `l = 1`) from the AER 2020 paper **plus** the full multi-horizon event study `DID_l` for `l = 1..L_max` from the dynamic companion paper. Phase 2 adds: per-group `DID_{g,l}` building block (Equation 3), dynamic placebos `DID^{pl}_l`, normalized estimator `DID^n_l`, cost-benefit aggregate `delta`, sup-t simultaneous confidence bands, and `plot_event_study()` integration. Phase 3 adds covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, and HonestDiD integration. Survey design supports pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) or replicate-weight variance (BRR/Fay/JK1/JKn/SDR) across all IF sites, plus opt-in PSU-level Hall-Mammen wild bootstrap via `n_bootstrap > 0` (see the full checklist + Notes below for the contract). **This is the most general library estimator for non-absorbing (reversible) treatments** - treatment can switch on AND off over time, switcher vs non-switcher is its primitive object, and it allows dynamic (carryover) effects with explicit joiner/leaver (`DID_+` / `DID_-`) decomposition - making it the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles. (`LPDiD` with `non_absorbing="first_entry"` / `"effect_stabilization"` and `TROP` with `non_absorbing=True` under a no-dynamic-effects assumption also accept non-absorbing treatment under stronger assumptions.) **Key implementation requirements:** @@ -2593,7 +2593,7 @@ confidence bands (sup-t) for event study. **Primary source:** [Athey, S., Imbens, G.W., Qu, Z., & Viviano, D. (2025). Triply Robust Panel Estimators. arXiv:2508.21536.](https://arxiv.org/abs/2508.21536) -**Note (version pinning):** the methodology promotion (`METHODOLOGY_REVIEW.md` `#### TROP` → **Complete** as of 2026-05-24) is anchored on **arXiv:2508.21536v2**; the current arXiv version is **v3**. A formal v2→v3 source delta-check against the v3 PDF has NOT been performed for any of the sections covered by the promotion (Eqs. 2-3, Algorithms 1-3, Section 2.2, Section 5.2-5.3, Section 6.1-6.2, Theorem 5.1, Corollary 1, Appendix Theorem 8.1). See `docs/methodology/papers/athey-2025-review.md` "Version-pinning note" for the deferred action item. +**Note (version pinning):** the methodology promotion (`METHODOLOGY_REVIEW.md` `#### TROP` → **Complete** as of 2026-05-24) is anchored on **arXiv:2508.21536v2**; the current arXiv version is **v3**. The **v3 PDF was consulted for the treatment-assignment-pattern sections** as part of the non-absorbing support work (§2.1 general assignment / "units moving into and out of treatment"; §2.2 Eq. 2 masking; §6.1 Eq. 12 / Algorithm 2; Assumption 1(i); Theorem 5.1) and confirms the general-assignment scope used here. A full v2→v3 source delta-check across all promoted sections (Eqs. 2-3, Algorithms 1-3, Section 2.2, Section 5.2-5.3, Section 6.1-6.2, Theorem 5.1, Corollary 1, Appendix Theorem 8.1) is still **deferred**. See `docs/methodology/papers/athey-2025-review.md` "Version-pinning note" for the deferred action item. **Key implementation requirements:** @@ -2604,26 +2604,47 @@ confidence bands (sup-t) for event study. *Treatment indicator (D matrix) semantics:* -D must be an **ABSORBING STATE** indicator, not a treatment timing indicator: +By default (`non_absorbing=False`) D must be an **ABSORBING STATE** indicator, not a +treatment timing indicator: - D[t, i] = 0 for all t < g_i (pre-treatment periods for unit i) - D[t, i] = 1 for all t >= g_i (during and after treatment for unit i) where g_i is the treatment start time for unit i. -For staggered adoption, different units have different treatment start times g_i. -The D matrix naturally handles this - distances use periods where BOTH units +For **staggered adoption** (different units treated at different times, but still +absorbing) the D matrix naturally handles this - distances use periods where BOTH units have D=0, matching the paper's (1 - W_iu)(1 - W_ju) formula in Equation 3. -**Wrong D specification**: If user provides event-style D (only first treatment period -has D=1), ATT will be incorrect - document this clearly. +**True non-absorbing assignment** (treatment switches on *and* off) is a distinct case +from staggered adoption. The paper (§2.1: "units moving into and out of treatment") +supports it via the same Eq. 12 / Algorithm 2 masking, and the library exposes it through +the opt-in `TROP(non_absorbing=True)` (`method='local'` only). See the requirements +checklist below and the `**Note:**` entries on the no-dynamic-effects requirement and the +block-only inference theory. + +**Wrong D specification**: With the default `non_absorbing=False`, an event-style D (only +the first treatment period has D=1, then back to 0) is a non-monotonic indicator and is +**rejected** with a `ValueError` (see "D matrix validation" below). This guards against the +common mistake of encoding absorbing treatment as an event spike, which would silently bias +the ATT. A user with genuinely non-absorbing treatment passes `non_absorbing=True`. *ATT definition (Equation 1, Section 6.1):* ``` τ̂ = (1 / Σ_i Σ_t W_{it}) Σ_{i=1}^N Σ_{t=1}^T W_{it} τ̂_{it}(λ̂) ``` -- ATT averages over ALL cells where D_it=1 (treatment indicator) +- ATT averages over all cells where D_it=1 (treatment indicator) that are + **estimable**. On balanced / support-complete absorbing panels every treated + cell is estimable, so this is all D=1 cells. A cell is non-estimable (NaN, + excluded) when `alpha_i + beta_t` is unidentified — its target unit and period + are not in the same connected component of the observed-control graph; this is + reachable under `non_absorbing=True` (always-treated unit, fully-treated period, + disconnected support) and on unbalanced absorbing panels (entirely-missing + unit/period controls). See the non-estimable-cell `**Note:**` below — matching + the library-wide non-estimable→NaN convention (cf. CallawaySantAnna group-time + cells). - No separate "post_periods" concept - D matrix is the sole input for treatment timing -- Supports general assignment patterns including staggered adoption +- Supports general assignment patterns including staggered adoption and (with + `non_absorbing=True`) on/off switching *Estimator equation (as implemented, Section 2.2):* @@ -2707,16 +2728,16 @@ Q(λ) = Σ_{j,s: D_js=0} [τ̂_js^loocv(λ)]² - **Results storage**: `TROPResults` stores *original* λ_nn value (inf), while computations use 1e10. λ_time and λ_unit store their selected values directly (0.0 = uniform). - **Empty control observations**: If no valid control observations exist, returns Q(λ) = ∞ with warning. A score of 0.0 would incorrectly "win" over legitimate parameters. - **Infinite LOOCV score handling**: If best LOOCV score is infinite, `best_lambda` is set to None, triggering defaults fallback -- Validation: requires at least 2 periods before first treatment -- **D matrix validation**: Treatment indicator must be an absorbing state (monotonic non-decreasing per unit) +- Validation: by default requires at least 2 periods before first treatment; with `non_absorbing=True` this becomes "at least 2 periods contain untreated cells" (the leading all-control block is ill-defined when treatment toggles) +- **D matrix validation** (default `non_absorbing=False`): Treatment indicator must be an absorbing state (monotonic non-decreasing per unit) - Detection: `np.diff(D, axis=0) < 0` for any column indicates violation - Handling: Raises `ValueError` with list of violating unit IDs and remediation guidance - - Error message includes: "convert to absorbing state: D[t, i] = 1 for all t >= first treatment period" - - **Rationale**: Event-style D (0→1→0) silently biases ATT; runtime validation prevents misuse - - **Unbalanced panels**: Missing unit-period observations are allowed. Monotonicity validation checks each unit's *observed* D sequence for monotonicity, which correctly catches 1→0 violations that span missing period gaps (e.g., D[2]=1, missing [3,4], D[5]=0 is detected as a violation even though the gap hides the transition in adjacent-period checks). + - Error message includes: "convert to absorbing state: D[t, i] = 1 for all t >= first treatment period" AND the opt-in pointer ("if treatment genuinely turns on and off, pass `non_absorbing=True`") + - **Rationale**: Event-style D (0→1→0) silently biases ATT when the user *meant* absorbing treatment; runtime validation prevents that misuse while the opt-in serves genuine on/off designs + - **`non_absorbing=True`**: the monotonicity check is skipped entirely, so on/off (and event-style) D matrices are accepted. Identification falls back to untreated *cells* (the per-(i,t) estimator masks treated cells via (1-W) and fits the rest), so even a fully toggling panel with no never-treated unit is admitted; only "no D=0 cells at all" is rejected. See the requirements checklist + Notes for the no-dynamic-effects requirement and the block-only inference caveat. + - **Unbalanced panels**: Missing unit-period observations are allowed. Monotonicity validation (default mode) checks each unit's *observed* D sequence for monotonicity, which correctly catches 1→0 violations that span missing period gaps (e.g., D[2]=1, missing [3,4], D[5]=0 is detected as a violation even though the gap hides the transition in adjacent-period checks). - **n_post_periods metadata**: Counts periods where D=1 is actually observed (at least one unit has D=1), not calendar periods from first treatment. In unbalanced panels where treated units are missing in some post-treatment periods, only periods with observed D=1 values are counted. -- Wrong D specification: if user provides event-style D (only first treatment period), - the absorbing-state validation will raise ValueError with helpful guidance +- Wrong D specification: with the default `non_absorbing=False`, an event-style D (only first treatment period) is rejected with a `ValueError` carrying both the convert-to-absorbing guidance and the `non_absorbing=True` opt-in pointer - **Bootstrap minimum**: `n_bootstrap` must be >= 2 (enforced via `ValueError`). TROP uses bootstrap for all variance estimation — there is no analytical SE formula. - **Note:** TROP bootstrap loops (`_bootstrap_variance`, `_bootstrap_rao_wu`, and their global counterparts, including both Rust happy paths — local and global) emit a proportional `UserWarning` via `diff_diff.bootstrap_utils.warn_bootstrap_failure_rate` when the replicate failure rate exceeds 5%. The previous hard-coded `< 10 successes` threshold let high-failure runs (e.g. 11 of 200) pass silently; this was classified as a silent failure under the Phase 2 audit (axis D — degenerate-replicate handling). The 5% threshold matches the existing SyntheticDiD bootstrap and placebo guards. When zero replicates succeed, SE is set to `NaN` (unchanged). The local Rust path previously also used `len >= 10` as a Python-fallback trigger; it now accepts any non-zero Rust result and emits the proportional warning instead of path-switching silently. - **LOOCV failure metadata**: When LOOCV fits fail in the Rust backend, the first failed observation coordinates (t, i) are returned to Python for informative warning messages @@ -2733,16 +2754,21 @@ Q(λ) = Σ_{j,s: D_js=0} [τ̂_js^loocv(λ)]² - [x] LOOCV uses SUM of squared errors per Equation 5 - [x] Rank selection implicit via nuclear-norm soft-thresholding (paper Section 5.3 + Appendix); `TROPResults.effective_rank` reports the diagnostic. No discrete `rank_selection` constructor parameter is exposed — earlier mention of "cv / ic / elbow" methods in this checklist was an overclaim, corrected in the methodology-promotion PR. Locked by `tests/test_methodology_trop.py::TestTROPDeviations::test_rank_selection_is_implicit_via_nuclear_norm`. - [x] Returns the fitted factor matrix and an effective-rank diagnostic (`TROPResults.factor_matrix` and `TROPResults.effective_rank`). The library does NOT expose separate factor-loading / factor-score outputs — earlier prose claiming "factor loadings and scores" was an overclaim corrected in the 2026-05-24 methodology-promotion PR (TROP's nuclear-norm soft-thresholded L is delivered as a single (n_periods × n_units) matrix, not decomposed into loading / score components on Results). -- [x] ATT averages over all D==1 cells (general assignment patterns) +- [x] ATT averages over all **estimable** D==1 cells (staggered adoption by default; on/off switching with `non_absorbing=True`). All D==1 cells are estimable on balanced / support-complete panels; cells whose `alpha_i + beta_t` is unidentified (target unit and period in different connected components of the observed-control graph) are NaN and excluded (see the non-estimable-cell `**Note:**`). - [x] No post_periods parameter (D matrix determines treatment timing) - [x] D matrix semantics documented (absorbing state, not event indicator) - [x] Unbalanced panels supported — missing control / pre-treatment cells don't trigger false absorbing-state violations. Locked by `tests/test_methodology_trop.py::TestTROPDeviations::test_unbalanced_panels_supported` (10% random drops on control + pre-treatment subset). Three additional unbalanced-panel regressions live in `tests/test_trop.py::TestPR110FeedbackRound8` (`test_unbalanced_panel_d_matrix_validation`, `test_unbalanced_panel_real_violation_still_caught`, `test_unbalanced_panel_multiple_missing_periods`). Absorbing-state monotonicity validation (which fires on unbalanced cases too) is covered by `tests/test_trop.py::TestDMatrixValidation`. -- [x] Per-observation treatment-effect estimation (Eq. 13 / Algorithm 2) — `treatment_effects` dict contains one finite `τ_hat_it` per treated cell, and the aggregate ATT equals the unweighted mean of per-cell effects (Eq. 1). **The methodology test exercises block adoption with a constant treatment effect**; **absorbing-state staggered adoption** and **heterogeneous per-cell effects** (paper Remark 6.1) are SUPPORTED by the code path (the implementation does not gate on cohort or effect-magnitude pattern), but are not directly verified in the methodology test surface in this PR. **Section 6.1 non-absorbing / on-off / switching assignment patterns are explicitly OUT OF SCOPE** — the absorbing-state validator at `trop_local.py` rejects non-monotonic D matrices with a `ValueError`, and `TestTROPDeviations::test_event_style_d_rejected_with_value_error` enforces the rejection contract (event-style D being one specific non-absorbing pattern; the same validator catches all 1→0 transitions). Cross-coverage of the staggered-cohort fit path is `tests/test_methodology_trop.py::TestTROPAlgorithm1LOOCV::test_control_set_includes_pretreat_of_eventually_treated` (two-cohort early-/late-treated panel under LOOCV-tuned `λ_unit`); absorbing-state structural validation is `tests/test_trop.py::TestDMatrixValidation`. +- [x] Per-observation treatment-effect estimation (Eq. 13 / Algorithm 2) — `treatment_effects` dict contains one `τ_hat_it` entry per treated cell (finite for estimable cells; NaN for a missing outcome or, under `non_absorbing`, a cell with no weighted control support — see the no-support `**Note:**`), and the aggregate ATT equals the unweighted mean of the finite per-cell effects (Eq. 1). **The methodology test exercises block adoption with a constant treatment effect**; **absorbing-state staggered adoption** and **heterogeneous per-cell effects** (paper Remark 6.1) are SUPPORTED by the code path (the implementation does not gate on cohort or effect-magnitude pattern), but are not directly verified in the methodology test surface for those specific patterns. Cross-coverage of the staggered-cohort fit path is `tests/test_methodology_trop.py::TestTROPAlgorithm1LOOCV::test_control_set_includes_pretreat_of_eventually_treated` (two-cohort early-/late-treated panel under LOOCV-tuned `λ_unit`); absorbing-state structural validation is `tests/test_trop.py::TestDMatrixValidation`. +- [x] **Section 6.1 non-absorbing / on-off / switching assignment patterns are SUPPORTED via the opt-in `TROP(non_absorbing=True)` (`method='local'` only)** — matching the paper's general-assignment scope (§2.1 "units moving into and out of treatment"; Eq. 12 / Algorithm 2 mask treated cells per (i,t) with no monotonicity requirement). The default (`non_absorbing=False`) still rejects non-monotonic D as a defensive guard (see the `**Note:**` entries below). Removing this opt-in restriction *narrows* a prior implementation over-restriction (the shipped estimator was stricter than the paper); it is **not** a new methodology deviation. Recovery on a no-dynamic-effects toggling DGP, the per-cell effect count, and the caveat warning are locked by `tests/test_methodology_trop.py::TestTROPDeviations::test_non_absorbing_general_assignment_supported`; the default-mode rejection contract by `TestTROPDeviations::test_event_style_d_rejected_with_value_error`; opt-in acceptance, the local-only guard, params round-trip, and Rust/Python parity by `tests/test_trop.py::TestDMatrixValidation`. - [x] Special-case reductions (paper Section 2.2): **DiD benchmark sanity check** (NOT a direct algebraic-equivalence proof) — TROP with `λ_nn=∞` + uniform weights produces an ATT within 0.5 of `DifferenceInDifferences` fitted as a basic 2×2 design on a TWFE-clean multi-period panel. This is empirical numerical agreement on a friendly DGP. A direct Section 2.2 reduction lock (true 2-period block-assignment panel where basic DiD is the algebraic target, or a comparison against `TwoWayFixedEffects` with explicit unit FE) is deferred. **Matrix Completion code path exercised** — TROP with uniform weights + finite `λ_nn` engages the nuclear-norm prox solver (effective_rank > 0) and beats the DiD-style baseline on a factor-confounded DGP; not an equivalence check against an independent MC reference. SC and SDID reductions are paper-claimed under "specific (omega, theta) weight choices" not provided in the paper text; cross-language anchor deferred until paper-author reference implementation clarifies the weight map. See `tests/test_methodology_trop.py::TestTROPSpecialCases`. - **Note:** The balancing representation / decomposition (paper Eq. 10, Section 5.2) is a paper-side identity. Direct numerical reconstruction of the four-term sum requires the internal `θ_s^{i,t}` / `ω_j^{i,t}` weight vectors, which are not exposed on the public TROP API; numerical Eq. 10 verification is therefore out of scope. The test `tests/test_methodology_trop.py::TestTROPNuclearNormProx::test_factor_matrix_consistent_with_treatment_effects` is a structural pointer only — it checks `factor_matrix` shape + finiteness + that `treatment_effects` is populated with finite entries, but does NOT lock the magnitude of `L_hat`. (The test DGP uses additive unit + time effects only; on a no-interactive-FE panel, the paper's framework absorbs the additive surfaces into `α_i` / `β_t`, so a near-zero `L_hat` is methodologically correct. An `effective_rank > 0` assertion would lock a solver artifact, not the intended low-rank behavior.) This is NOT a full Eq. 10 lock. The Eq. 2 ingredients (soft-threshold SVD, **plain prox-gradient monotonicity** — NOT the shipped accelerated FISTA outer loop, which uses Nesterov momentum and does not guarantee per-step monotonicity, see `TestTROPNuclearNormProx` class docstring — weighted-prox) that the Eq. 10 derivation relies on are independently verified in the same class. - **Note (library-side choice):** Weight normalization (Gap #5 in `docs/methodology/papers/athey-2025-review.md`): paper Section 5 (p. 20) states weights sum to one (`1ᵀω = 1ᵀθ = 1`), but Eq. 3 (p. 7) writes unnormalized exponential weights. **The paper-side ambiguity remains open**; the library resolves it as a documented deviation — the shipped implementation matches Eq. 2 (unnormalized). Verified by `tests/test_methodology_trop.py::TestTROPDeviations::test_unnormalized_weights_match_eq2`. Will be revisited once paper-author reference implementation lands. - **Note (deferral):** Equation 14 covariate extension (`Y_it = α_i + β_t + X_it·β_coef + R_it` with R low-rank, paper Section 6.2) is **not implemented**. `TROP.fit()` does not accept a `covariates` keyword argument. The corresponding Theorem 8.1 covariate-triple-robustness result is correspondingly out of scope. The non-support is locked by `tests/test_methodology_trop.py::TestTROPDeviations::test_covariates_not_supported`, which uses `inspect.signature` to guard against future `**kwargs` silently breaking the contract. Deferred until use cases motivate the X threading through `trop_local.py` / `trop_global.py` / LOOCV / bootstrap. - **Note:** Survey support: weights, strata, PSU, and FPC are all supported via Rao-Wu rescaled bootstrap with cross-classified pseudo-strata (Phase 6). Rust backend remains pweight-only; full-design surveys fall back to the Python bootstrap path. Survey weights enter ATT aggregation only — population-weighted average of per-observation treatment effects. Model fitting (kernel weights, LOOCV, nuclear norm regularization) stays unchanged. Rust and Python bootstrap paths both support survey-weighted ATT in each iteration. +- **Note (defensive default):** `non_absorbing` defaults to `False`, retaining the absorbing-state monotonicity gate. This is an implementation choice, not a paper requirement: the gate's primary value is catching the common mistake of encoding *absorbing* treatment as an event-style spike (a single D=1 period), which silently biases the ATT. Genuine on/off designs opt in with `non_absorbing=True`. The default-mode rejection message carries both the convert-to-absorbing guidance and the opt-in pointer. +- **Note (scope — local only):** `non_absorbing=True` is supported only for `method='local'`. The `global` method's post-hoc weighting and bootstrap bake in a contiguous, simultaneous treated block (it already rejects staggered adoption), so `TROP(method='global', non_absorbing=True)` raises a `ValueError`. The Rust local LOOCV/bootstrap paths are already mask-driven (`D==0`/`D==1`) and required no change; Rust/Python ATT parity on a non-absorbing panel is locked by `tests/test_trop.py::TestDMatrixValidation::test_non_absorbing_rust_python_parity`. For a fully toggling panel (no never-treated unit), the local Rust bootstrap is bypassed in favour of the Python loop (the Rust stratified resampler can return a degenerate ~0 SE on an empty control stratum). +- **Note (inference caveat for non-absorbing):** The paper's *point estimator* (Eq. 12 / Algorithm 2) supports general assignment, but the formal **triple-robustness guarantee (Theorem 5.1) is proven only under Assumption 1(i) block assignment** `W_it = 1{i>N0}·1{t>T0}`; the paper does not extend that guarantee to general/non-absorbing patterns (cf. `docs/methodology/papers/athey-2025-review.md`). The non-parametric bootstrap (Algorithm 3) is offered generally but "its validity requires a growing number of treated units." Non-absorbing validity additionally relies on the paper's **no-spillover / no-dynamic-effects (no carryover)** assumption (paper §2.1). `TROP.fit()` emits a one-time `UserWarning` carrying these caveats whenever `non_absorbing=True`; the warning is locked by `tests/test_methodology_trop.py::TestTROPDeviations::test_non_absorbing_general_assignment_supported` and its absence in default mode by `test_non_absorbing_no_caveat_in_default_mode`. +- **Note (non-absorbing non-estimable-cell trimming → estimable-cell ATT):** The working model fits unregularized unit/time fixed effects `alpha_j` / `beta_s` on the weighted observed control cells, then sets `tau_it = Y_it - alpha_i - beta_t - L_it`. A treated cell (i,t) is **estimable** only if the sum `alpha_i + beta_t` is identified by that two-way-FE fit. In a two-way FE model the effects are pinned only **within each connected component** of the bipartite graph whose nodes are units and periods and whose edges are the positively-weighted **observed** control cells (`usable = (D==0) & ~missing & isfinite(Y) & ω>0`); across components there is a free per-component offset. So estimability requires the **target unit node and target period node to lie in the same connected component** of that graph (predicate `diff_diff.trop_local._treated_cell_is_estimable`, a bipartite BFS run per treated cell with a cheap empty-row/empty-column fast-path). A marginal "the target unit has some usable control AND the target period has some usable control" test is **necessary but not sufficient** — e.g. usable cells at `(unitA,t0)` and `(unitB,t1)` with target `(unitA,t1)` pass it yet span two disconnected components, leaving `alpha_A + beta_1` unidentified. The connected-component check subsumes the simpler degeneracies: under `non_absorbing=True` (1) an **always-treated unit** has an empty control column (isolated unit node) — true even with `lambda_unit=0`; and (2) a **fully-treated period** has an empty control row (isolated period node). In all these cases tau would silently leak the fixed effect. Non-estimable cells are materialized as `NaN` in `treatment_effects` and **excluded from the ATT**, which is therefore the mean over **estimable** treated cells — NOT all D=1 cells. This matches the library-wide non-estimable→NaN convention (the per-named-cell analogue of CallawaySantAnna materializing non-estimable (g,t) as NaN); it is a **defensive choice for a degeneracy the paper does not cover** (the paper assumes enough overlap), not a deviation from Eq. 1 on the cells it covers. There is no `λ` that restores identification for these cells (the missing control row/column is structural), so the warning does not suggest one. **The predicate is applied to every local fit (absorbing and non-absorbing) as a general correctness guard** — it NaNs exactly the cells whose FE is genuinely unidentified. It is a **no-op whenever every treated cell's target unit and period have an OBSERVED control cell**: always true on a balanced panel, and in **absorbing** mode also true on unbalanced panels (a never-treated unit is a control at every observed period and each treated unit's pre-treatment controls are observed) — *unless* an unbalanced absorbing panel happens to leave a treated unit's pre-period controls or a period's controls entirely missing, in which case NaN-ing those cells is the correct fix to the identical latent FE leak (the prior behavior silently reported a contaminated tau). So estimable-cell trimming is the contract for **all** local TROP fits on unbalanced panels, not only non-absorbing ones. The point fit and the bootstrap refit apply the identical predicate; a draw with no estimable cell returns NaN and counts as a failed bootstrap replicate. **Rust/bootstrap parity:** the Rust per-cell bootstrap lacks the estimability guard, so whenever the point fit trims any cell (`force_python=True`, set from `n_no_support>0`) — or under `non_absorbing` generally — the bootstrap is routed to the guarded Python `_fit_with_fixed_lambda`, keeping the SE and the point ATT on the same estimable-cell set. (Rust remains the happy path for clean fits with no trimming.) **LOOCV is support-agnostic** by design: a degenerate pseudo-control cell yields a large raw-outcome pseudo-effect that inflates `Q(λ)`, so support-destroying `λ_unit` values are *naturally disfavored* (a soft penalty) rather than hard-rejected — hard-rejecting (`Q=∞`) would over-restrict. `TROP.fit()` emits a `UserWarning` naming the count of non-estimable cells. Locked by `tests/test_methodology_trop.py::TestTROPDeviations`: `test_non_absorbing_always_treated_unit_not_raw_outcome` (always-treated unit, `lambda_unit>0` and `lambda_unit=0`), `test_non_absorbing_fully_treated_period_not_estimable` (fully-treated period), `test_non_absorbing_disconnected_support_not_estimable` (disconnected bipartite control graph), and `test_unbalanced_absorbing_unidentified_unit_not_estimable` (the guard + `force_python` bootstrap parity in default absorbing mode). ### TROP Global Estimation Method diff --git a/docs/methodology/papers/athey-2025-review.md b/docs/methodology/papers/athey-2025-review.md index c2e253c7..a4148167 100644 --- a/docs/methodology/papers/athey-2025-review.md +++ b/docs/methodology/papers/athey-2025-review.md @@ -5,7 +5,7 @@ **PDF reviewed:** https://arxiv.org/abs/2508.21536v2 (version-pinned arXiv abstract for v2) **Review date:** 2026-02-08 -**Version-pinning note (2026-05-25):** The current arXiv version of arXiv:2508.21536 is **v3** (submitted 2026-02-09). The 2026-05-24 methodology promotion ships against this v2-pinned review; a formal v2-vs-v3 delta-check against the v3 PDF for TROP-relevant methodology changes (Eqs. 2-3, Algorithms 1-3, Section 2.2, Section 5.2-5.3, Section 6.1-6.2, Theorem 5.1, Corollary 1, Appendix Theorem 8.1) has **NOT** been performed. +**Version-pinning note (2026-05-25):** The current arXiv version of arXiv:2508.21536 is **v3** (submitted 2026-02-09). The 2026-05-24 methodology promotion ships against this v2-pinned review; a formal v2-vs-v3 delta-check against the v3 PDF for TROP-relevant methodology changes (Eqs. 2-3, Algorithms 1-3, Section 2.2, Section 5.2-5.3, Section 6.1-6.2, Theorem 5.1, Corollary 1, Appendix Theorem 8.1) has **NOT** been performed in full. **Update (non-absorbing support work):** the v3 PDF was consulted for the treatment-assignment-pattern sections (§2.1 general assignment, §2.2 Eq. 2 masking, §6.1 Eq. 12 / Algorithm 2, Assumption 1(i), Theorem 5.1) and confirms the general-assignment scope on which `TROP(non_absorbing=True)` is built; the remaining sections of the delta-check stay deferred. **Action item**: before the next paper-author reference implementation or substantive v3 release, refresh this review against the most recent arXiv version, perform a real v2→v3 PDF delta audit, and re-validate that the verified-component checklist still maps cleanly. Pending that refresh, the methodology promotion is anchored on v2 as documented here. @@ -283,9 +283,9 @@ Note: Stratified bootstrap -- control and treated units resampled separately. Pr - **Outcome matrix**: Y (N x T), observed outcomes - **Treatment matrix**: W (N x T), binary treatment assignments where `W_it in {0, 1}` - **Covariates** (optional): X_it, observed covariates for each unit-period pair -- Treatment must be an absorbing state for standard block assignment (W_it = 1{i > N_0} * 1{t > T_0}) -- **Paper scope (Equation 13):** the paper extends TROP to general assignment patterns including treatment switching on/off. -- **Shipped implementation:** the current `diff_diff/trop.py` requires an absorbing-state treatment indicator and rejects non-absorbing/event-style inputs (gate in `diff_diff/trop.py:505-525`, also documented in `docs/methodology/REGISTRY.md` under TROP). Generalization to non-absorbing patterns is not in scope for the current implementation. +- Treatment is an absorbing state for standard block assignment (W_it = 1{i > N_0} * 1{t > T_0}); this is the default mode. +- **Paper scope (Equation 13 / Section 6.1):** the paper extends TROP to general assignment patterns including treatment switching on/off (§2.1: "units moving into and out of treatment"). +- **Shipped implementation:** `diff_diff/trop.py` accepts general (on/off) assignment via the opt-in `TROP(non_absorbing=True)` (`method='local'` only), matching the paper's scope. The default `non_absorbing=False` retains the absorbing-state monotonicity gate (in `diff_diff/trop_local.py::_setup_trop_data`, around `trop_local.py:131-144`) as a defensive guard against event-style mis-encoding; it rejects non-monotonic D with a `ValueError` that also points to the opt-in. See `docs/methodology/REGISTRY.md` under TROP for the no-dynamic-effects requirement and the block-only inference caveat (Theorem 5.1 is proven under Assumption 1(i) block assignment only). Removing the opt-in restriction *narrows* a prior implementation over-restriction; the global method still requires block assignment and rejects `non_absorbing=True`. ### Computational Considerations - **Main bottleneck**: LOOCV grid search -- for each grid point, every control observation requires a separate nuclear-norm penalized weighted least squares solve diff --git a/docs/performance-scenarios.md b/docs/performance-scenarios.md index 38d41a48..b0a42ea9 100644 --- a/docs/performance-scenarios.md +++ b/docs/performance-scenarios.md @@ -259,8 +259,9 @@ serves a different purpose: R-parity accuracy). They complement it. - **Persona / domain.** Marketing analyst measuring an always-on-with- dark-periods campaign, or a health-policy researcher studying a policy - that switches on and off. Reversible treatment breaks every other - staggered estimator; dCDH is the only option. + that switches on and off. Reversible treatment breaks the absorbing-only + staggered estimators; dCDH is the most general fit (LPDiD/TROP `non_absorbing` + also handle it under stronger assumptions). - **Data shape.** 120 groups x 10 periods, single-switch pattern per group, ~40% always-control, survey-weighted with 8 strata and 24 PSUs. Larger than the Tutorial's 80 x 6 demo to expose the `L_max` multi-horizon diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst index 72f33ed6..1dd6e5b5 100644 --- a/docs/practitioner_decision_tree.rst +++ b/docs/practitioner_decision_tree.rst @@ -157,11 +157,14 @@ market is treated it stays treated. **Recommended method:** :class:`~diff_diff.ChaisemartinDHaultfoeuille` (alias :class:`~diff_diff.DCDH`) -This is the **only library estimator** that handles non-absorbing (reversible) +This is the **most general library estimator** for non-absorbing (reversible) treatments. It compares period-to-period outcome changes in markets that switch into treatment ("joiners") and markets that switch out ("leavers"), against simultaneously-stable controls. You get three numbers: the overall lift `DID_M`, -a joiners-only view `DID_+`, and a leavers-only view `DID_-`. +a joiners-only view `DID_+`, and a leavers-only view `DID_-`. (:class:`~diff_diff.LPDiD` +with ``non_absorbing="first_entry"`` / ``"effect_stabilization"`` and +:class:`~diff_diff.TROP` with ``non_absorbing=True`` — under a no-dynamic-effects +assumption — also handle non-absorbing treatment, under stronger assumptions.) .. code-block:: python @@ -442,7 +445,7 @@ At a Glance - Handles different launch dates correctly * - On/off cycles (reversible treatment) - ``ChaisemartinDHaultfoeuille`` - - Only library option for non-absorbing treatments + - Most general option for non-absorbing treatments (see also LPDiD/TROP ``non_absorbing``) * - Varied spending levels - ``ContinuousDiD`` - Dose-response curve diff --git a/docs/tutorials/19_dcdh_marketing_pulse.ipynb b/docs/tutorials/19_dcdh_marketing_pulse.ipynb index 51ed310d..0e6cd652 100644 --- a/docs/tutorials/19_dcdh_marketing_pulse.ipynb +++ b/docs/tutorials/19_dcdh_marketing_pulse.ipynb @@ -4,11 +4,7 @@ "cell_type": "markdown", "id": "t19-cell-001", "metadata": {}, - "source": [ - "# Tutorial 19: dCDH for Marketing Pulse Campaigns\n", - "\n", - "A practitioner walkthrough for measuring lift from promotional campaigns that turn on AND off across markets at staggered times. The tutorial uses the `ChaisemartinDHaultfoeuille` estimator (alias `DCDH`) - diff-diff's only estimator built for reversible (non-absorbing) treatment, where every other modern staggered estimator in the library assumes treatment is absorbing." - ] + "source": "# Tutorial 19: dCDH for Marketing Pulse Campaigns\n\nA practitioner walkthrough for measuring lift from promotional campaigns that turn on AND off across markets at staggered times. The tutorial uses the `ChaisemartinDHaultfoeuille` estimator (alias `DCDH`) - diff-diff's estimator purpose-built for reversible (non-absorbing) treatment: the most general non-absorbing option in the library, with explicit joiner/leaver decomposition. (`LPDiD` and `TROP` also support non-absorbing treatment via their `non_absorbing` parameters, under stronger assumptions.)" }, { "cell_type": "markdown", @@ -124,16 +120,7 @@ "cell_type": "markdown", "id": "t19-cell-008", "metadata": {}, - "source": [ - "## 3. Fitting dCDH\n", - "\n", - "`DID_M` is the headline dCDH estimator: the average across periods of two pieces:\n", - "\n", - "- **`DID_+`** (joiners): markets switching `0 → 1` between consecutive periods, compared to *contemporaneously untreated* control cells.\n", - "- **`DID_-`** (leavers): markets switching `1 → 0`, compared to *contemporaneously treated* control cells.\n", - "\n", - "Both pieces use only cells whose treatment status was stable across the two periods being compared - so no treated unit is ever used as a control for another treated unit. The library reports `DID_+`, `DID_-`, and their average `DID_M` separately, so you can see if the two halves agree." - ] + "source": "## 3. Fitting dCDH\n\n`DID_M` is the headline dCDH estimator: the average across periods of two pieces:\n\n- **`DID_+`** (joiners): markets switching `0 → 1` between consecutive periods, compared to *contemporaneously untreated* control cells.\n- **`DID_-`** (leavers): markets switching `1 → 0`, compared to *contemporaneously treated* control cells.\n\nBoth pieces use only cells whose treatment status was stable across the two periods being compared. No *switching* cell is used as a control; stable-untreated cells serve as controls for joiners, and stable-treated cells serve as controls for leavers. The library reports `DID_+`, `DID_-`, and their average `DID_M` separately, so you can see if the two halves agree." }, { "cell_type": "markdown", @@ -313,21 +300,7 @@ "cell_type": "markdown", "id": "t19-cell-019", "metadata": {}, - "source": [ - "## 5. Communicating Results to Leadership\n", - "\n", - "A stakeholder-ready summary of the analysis above:\n", - "\n", - "> **Headline.** The pulse campaign lifted weekly checkout sessions by approximately **12 sessions per market per week** while the promo was on (95% CI: 11.3 to 12.8). On a baseline of about 110 weekly sessions per market, that's roughly an **11% lift**. *[Source: `results.overall_att` from Section 3.]*\n", - ">\n", - "> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 38 markets started untreated and switched the promo on at some point during the quarter (joiners), and 22 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020) - diff-diff's only estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher counts and panel shape from Section 2.]*\n", - ">\n", - "> **Validity evidence.** Two checks supported the result. (a) The joiners-vs-leavers split agreed: joiners produced a +12.1 lift, leavers a +11.9 lift, well within sampling uncertainty of each other and of the headline. (b) The multi-horizon placebos at l = -2 and l = -1 both sat on zero with bootstrap CIs comfortably covering it - parallel pre-trends look credible. *[Sources: joiners/leavers from Section 3, multi-horizon placebos from Section 4.]*\n", - ">\n", - "> **What \"+12 sessions per market per week\" means in business terms.** Across 60 markets and the weeks each one had the promo on, that's the per-market-week lift attributable to the campaign. Translate to your own revenue-per-session to compare against campaign spend, then use the per-market lift estimate to project what scaling the promo to additional markets would deliver.\n", - ">\n", - "> **Practical significance caveat.** The 11% lift is statistically significant (bootstrap p < 0.01 at both post-treatment horizons), and the on-impact effect persists at the second horizon - the pulse worked while it was on. Whether 11% justifies the campaign cost is a business judgment, not a statistical one. *[Sources: dynamic horizons from Section 4.]*" - ] + "source": "## 5. Communicating Results to Leadership\n\nA stakeholder-ready summary of the analysis above:\n\n> **Headline.** The pulse campaign lifted weekly checkout sessions by approximately **12 sessions per market per week** while the promo was on (95% CI: 11.3 to 12.8). On a baseline of about 110 weekly sessions per market, that's roughly an **11% lift**. *[Source: `results.overall_att` from Section 3.]*\n>\n> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 38 markets started untreated and switched the promo on at some point during the quarter (joiners), and 22 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020) - diff-diff's most general estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher counts and panel shape from Section 2.]*\n>\n> **Validity evidence.** Two checks supported the result. (a) The joiners-vs-leavers split agreed: joiners produced a +12.1 lift, leavers a +11.9 lift, well within sampling uncertainty of each other and of the headline. (b) The multi-horizon placebos at l = -2 and l = -1 both sat on zero with bootstrap CIs comfortably covering it - parallel pre-trends look credible. *[Sources: joiners/leavers from Section 3, multi-horizon placebos from Section 4.]*\n>\n> **What \"+12 sessions per market per week\" means in business terms.** Across 60 markets and the weeks each one had the promo on, that's the per-market-week lift attributable to the campaign. Translate to your own revenue-per-session to compare against campaign spend, then use the per-market lift estimate to project what scaling the promo to additional markets would deliver.\n>\n> **Practical significance caveat.** The 11% lift is statistically significant (bootstrap p < 0.01 at both post-treatment horizons), and the on-impact effect persists at the second horizon - the pulse worked while it was on. Whether 11% justifies the campaign cost is a business judgment, not a statistical one. *[Sources: dynamic horizons from Section 4.]*" }, { "cell_type": "markdown", @@ -387,4 +360,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/tests/test_methodology_trop.py b/tests/test_methodology_trop.py index f6a662eb..e18099ad 100644 --- a/tests/test_methodology_trop.py +++ b/tests/test_methodology_trop.py @@ -2423,3 +2423,435 @@ def test_safe_inference_nan_propagation_contract(self): "conf_int": conf_int, } ) + + # ------------------------------------------------------------------ + # Non-absorbing (general assignment) support — Eq. 1 / Eq. 12 / + # Algorithm 2, Section 6.1. The paper's estimator handles general + # assignment patterns ("units moving into and out of treatment"), + # not only absorbing/staggered adoption (§2.1). The library exposes + # this via the opt-in TROP(non_absorbing=True); the default still + # rejects non-monotonic D (covered in test_trop.py::TestDMatrixValidation + # and test_event_style_d_rejected_with_value_error above). + # ------------------------------------------------------------------ + + @staticmethod + def _make_non_absorbing_panel(seed=0, tau=3.0, n_units=16, n_periods=8, all_toggle=False): + """TWFE-clean panel with on/off (non-absorbing) treatment, no dynamic effects. + + Y_it(0) = alpha_i + beta_t + noise; Y_it(1) = Y_it(0) + tau. Some units + switch treatment on and then off again, so D is non-monotonic. + """ + rng = np.random.default_rng(seed) + alpha = rng.normal(0.0, 1.0, n_units) + beta = rng.normal(0.0, 1.0, n_periods) + rows = [] + for i in range(n_units): + d = np.zeros(n_periods, dtype=int) + if all_toggle: + on = 3 + (i % 2) + d[on : on + 2] = 1 # every unit treated on an interior block + elif i % 4 == 0 and i > 0: + d[4:6] = 1 # on then off (non-absorbing) + elif i % 3 == 0: + d[5:] = 1 # absorbing block (mix of patterns) + for t in range(n_periods): + y0 = alpha[i] + beta[t] + rng.normal(0.0, 0.05) + rows.append( + { + "unit": i, + "period": t, + "outcome": y0 + (tau if d[t] == 1 else 0.0), + "treated": int(d[t]), + } + ) + return pd.DataFrame(rows) + + @pytest.mark.slow + def test_non_absorbing_general_assignment_supported(self): + """TROP(non_absorbing=True) accepts on/off treatment and recovers the + ATT on a no-dynamic-effects DGP (Eq. 1 averages over all D=1 cells; + Eq. 12 / Algorithm 2 masks treated cells per (i, t)). A caveat + ``UserWarning`` is emitted because Theorem 5.1's guarantee is proven + only under block assignment. + """ + tau = 3.0 + df = self._make_non_absorbing_panel(seed=0, tau=tau) + n_treated_cells = int(df["treated"].to_numpy().sum()) + # Sanity: the panel really is non-absorbing (some unit goes 1 -> 0). + treated_wide = df.pivot(index="period", columns="unit", values="treated").to_numpy() + assert bool( + (np.diff(treated_wide, axis=0) < 0).any() + ), "test panel must contain a 1->0 transition" + + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=1, + ) + with pytest.warns(UserWarning, match="(?i)non_absorbing.*Theorem 5.1"): + res = est.fit(df, "outcome", "treated", "unit", "period") + + # Estimand: ATT averages the per-cell effects over all D=1 cells (Eq. 1). + assert np.isfinite(res.att) + assert abs(res.att - tau) < 0.5, f"ATT {res.att} should recover tau={tau}" + # One finite per-cell effect per treated cell (Eq. 12 / Algorithm 2). + assert len(res.treatment_effects) == n_treated_cells + assert all(np.isfinite(v) for v in res.treatment_effects.values()) + + def test_non_absorbing_no_caveat_in_default_mode(self): + """The non-absorbing caveat warning fires ONLY for non_absorbing=True; + a default (absorbing) fit must not emit it. + """ + # Absorbing staggered panel (monotonic per unit). + rows = [] + for i in range(12): + g = 4 if i < 3 else (6 if i < 6 else None) + for t in range(8): + d = 1 if (g is not None and t >= g) else 0 + rows.append( + { + "unit": i, + "period": t, + "outcome": float(i) * 0.1 + float(t) * 0.2 + (2.0 if d else 0.0), + "treated": d, + } + ) + df = pd.DataFrame(rows) + est = TROP( + method="local", + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=1, + ) + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always") + est.fit(df, "outcome", "treated", "unit", "period") + assert not any( + "non_absorbing" in str(w.message) for w in caught + ), "default (absorbing) mode must not emit the non_absorbing caveat" + + @pytest.mark.slow + def test_non_absorbing_unbalanced_panel_supported(self): + """Non-absorbing support tolerates unbalanced panels (random missing + control cells) and still returns a finite ATT. + """ + df = self._make_non_absorbing_panel(seed=7, tau=3.0) + # Drop 10% of untreated rows at random (missing control observations). + rng = np.random.default_rng(11) + control_rows = df.index[df["treated"] == 0].to_numpy() + drop = rng.choice(control_rows, size=int(0.1 * len(control_rows)), replace=False) + df_unbalanced = df.drop(index=drop).reset_index(drop=True) + + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=1, + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = est.fit(df_unbalanced, "outcome", "treated", "unit", "period") + assert np.isfinite(res.att) + assert np.isfinite(res.se) + + @pytest.mark.slow + @pytest.mark.parametrize("lambda_unit", [0.0, 1.0]) + def test_non_absorbing_always_treated_unit_not_raw_outcome(self, lambda_unit): + """A treated cell whose UNIT has no observed control cell leaves ``alpha_i`` + unidentified, so its tau would silently leak the unit fixed effect (a + raw-outcome-like value). Such cells must be marked non-estimable (NaN). + This holds for BOTH ``lambda_unit=0`` (uniform unit weights still give the + always-treated unit no own control row) and ``lambda_unit>0`` (inf + distance -> zero donor weights). Estimable cells still recover the effect + and the bootstrap SE stays finite. + + Locks the documented behavior (REGISTRY ## TROP "non-absorbing + non-estimable-cell trimming" Note): the ATT is the mean over estimable + treated cells (library-wide non-estimable->NaN convention). + """ + rng = np.random.default_rng(0) + n_units, n_periods, tau = 8, 8, 5.0 + alpha = rng.normal(0.0, 1.0, n_units) + alpha[0] = 10.0 # large unit-0 FE so any leak is unmistakable + beta = rng.normal(0.0, 1.0, n_periods) + rows = [] + for i in range(n_units): + for t in range(n_periods): + if i == 0: + d = 1 # always-treated: no untreated history + elif i % 3 == 0: + d = 1 if 4 <= t <= 5 else 0 # on/off + else: + d = 1 if t >= 6 else 0 # untreated history present + y0 = alpha[i] + beta[t] + rng.normal(0.0, 0.05) + rows.append( + { + "unit": i, + "period": t, + "outcome": y0 + (tau if d else 0.0), + "treated": int(d), + } + ) + df = pd.DataFrame(rows) + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[lambda_unit], + lambda_nn_grid=[0.1], + n_bootstrap=3, + seed=1, + ) + with pytest.warns(UserWarning, match="(?i)not estimable"): + res = est.fit(df, "outcome", "treated", "unit", "period") + + # Every cell of the always-treated unit is non-estimable (NaN), never a + # fixed-effect-contaminated raw outcome (alpha_0 = 10 would leak otherwise). + raw_y = { + t: float(df[(df.unit == 0) & (df.period == t)]["outcome"].iloc[0]) + for t in range(n_periods) + } + u0 = {k: v for k, v in res.treatment_effects.items() if k[0] == 0} + assert len(u0) == n_periods + for (_, t), v in u0.items(): + assert np.isnan(v), f"cell(0,{t}) should be NaN, got {v}" + assert not np.isclose(v, raw_y[t]), "tau must not equal raw outcome" + + # Estimable cells (other units) remain finite and aggregate near the truth. + assert np.isfinite(res.att) + assert np.isfinite(res.se) + assert abs(res.att - tau) < 0.6 + + @pytest.mark.slow + def test_non_absorbing_fully_treated_period_not_estimable(self): + """A period in which EVERY unit is treated has no control cell, so + ``beta_t`` is unidentified and that period's tau would leak the time fixed + effect. Those cells must be NaN (non-estimable), not finite raw-outcome + values; treated cells in other periods still recover the effect. + """ + rng = np.random.default_rng(1) + n_units, n_periods, tau, hot = 8, 8, 3.0, 4 + alpha = rng.normal(0.0, 1.0, n_units) + beta = rng.normal(0.0, 1.0, n_periods) + beta[hot] = 20.0 # large period-`hot` FE so any leak is unmistakable + rows = [] + for i in range(n_units): + for t in range(n_periods): + if t == hot: + d = 1 # every unit treated at `hot` -> no control at that period + elif i % 2 == 0: + d = 1 if t >= 6 else 0 + else: + d = 1 if 1 <= t <= 2 else 0 + y0 = alpha[i] + beta[t] + rng.normal(0.0, 0.05) + rows.append( + { + "unit": i, + "period": t, + "outcome": y0 + (tau if d else 0.0), + "treated": int(d), + } + ) + df = pd.DataFrame(rows) + # Sanity: period `hot` is fully treated. + assert bool((df[df.period == hot]["treated"].to_numpy() == 1).all()) + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=3, + seed=1, + ) + with pytest.warns(UserWarning, match="(?i)not estimable"): + res = est.fit(df, "outcome", "treated", "unit", "period") + + hot_cells = {k: v for k, v in res.treatment_effects.items() if k[1] == hot} + assert len(hot_cells) == n_units + for k, v in hot_cells.items(): + assert np.isnan(v), f"fully-treated-period cell {k} should be NaN, got {v}" + # Treated cells in other (estimable) periods still recover the effect. + assert np.isfinite(res.att) + assert np.isfinite(res.se) + assert abs(res.att - tau) < 0.6 + + @pytest.mark.slow + def test_non_absorbing_fully_toggling_no_never_treated_unit(self): + """non_absorbing admits a fully toggling panel with NO never-treated unit + (every unit is treated at some point but retains observed untreated + cells). Identification falls back to untreated cells, and the bootstrap + runs via the Python path (the Rust stratified resampler can return a + degenerate ~0 SE on an empty control stratum). Asserts admission + finite + ATT/SE + recovery. + """ + tau = 4.0 + df = self._make_non_absorbing_panel(seed=2, tau=tau, all_toggle=True) + # Sanity: no never-treated unit (every unit treated at some period). + assert bool((df.groupby("unit")["treated"].max().to_numpy() == 1).all()) + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=3, + seed=1, + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = est.fit(df, "outcome", "treated", "unit", "period") + assert np.isfinite(res.att) + assert np.isfinite(res.se) + assert res.se > 0 # not a degenerate empty-stratum ~0 SE + assert abs(res.att - tau) < 0.6 + + @pytest.mark.slow + def test_unbalanced_absorbing_unidentified_unit_not_estimable(self): + """The estimability guard applies to DEFAULT (absorbing) local fits too, + not only non_absorbing. On an unbalanced absorbing panel where a treated + unit's pre-treatment rows are entirely missing, that unit has no observed + control cell, so ``alpha_i`` is unidentified; its cells must be NaN (the + prior behavior silently reported a fixed-effect-contaminated tau), while + the rest of the panel is estimated normally. ``non_absorbing=False``. + """ + rng = np.random.default_rng(3) + n_periods, tau = 6, 4.0 + beta = rng.normal(0.0, 1.0, n_periods) + rows = [] + # Never-treated controls (units 0-2), observed all periods. + for i in range(3): + a = rng.normal(0.0, 1.0) + for t in range(n_periods): + rows.append( + { + "unit": i, + "period": t, + "outcome": a + beta[t] + rng.normal(0, 0.05), + "treated": 0, + } + ) + # Well-observed treated unit (unit 3), adopts at t=4, full pre-history. + a3 = rng.normal(0.0, 1.0) + for t in range(n_periods): + d = 1 if t >= 4 else 0 + rows.append( + { + "unit": 3, + "period": t, + "outcome": a3 + beta[t] + rng.normal(0, 0.05) + (tau if d else 0), + "treated": d, + } + ) + # Pathological treated unit (unit 4): adopts at t=3 but ONLY observed at + # its treated periods 3,4,5 -- pre-treatment rows 0,1,2 are MISSING, so it + # has no observed control cell and alpha_4 is unidentified. + a4 = rng.normal(0.0, 1.0) + a4 += 12.0 # large FE so any leak into tau would be unmistakable + for t in (3, 4, 5): + rows.append( + { + "unit": 4, + "period": t, + "outcome": a4 + beta[t] + rng.normal(0, 0.05) + tau, + "treated": 1, + } + ) + df = pd.DataFrame(rows) + + est = TROP( + method="local", # non_absorbing defaults to False (absorbing) + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=3, + seed=1, + ) + with pytest.warns(UserWarning, match="(?i)not estimable"): + res = est.fit(df, "outcome", "treated", "unit", "period") + + # Unit 4's treated cells are NaN (alpha_4 unidentified), never a leaked FE. + u4 = {k: v for k, v in res.treatment_effects.items() if k[0] == 4} + assert len(u4) == 3 + for k, v in u4.items(): + assert np.isnan(v), f"unidentified-unit cell {k} should be NaN, got {v}" + # Unit 3 (well-observed) is estimated and recovers the effect. + u3 = [v for k, v in res.treatment_effects.items() if k[0] == 3 and np.isfinite(v)] + assert len(u3) > 0 + assert np.isfinite(res.att) + assert abs(res.att - tau) < 0.6 + + @pytest.mark.slow + def test_non_absorbing_disconnected_support_not_estimable(self): + """Strict two-way-FE identification: ``alpha_i + beta_t`` is pinned only + within a connected component of the observed-control bipartite graph. A + treated cell whose target unit and target period fall in DIFFERENT + components is non-estimable even though both have *some* control support + (the marginal row/column check would wrongly pass it). Such cells must be + NaN, not a finite cross-component FE-contaminated value. + + Construction (periods 0-5): component A = units {0,1,2} whose untreated + periods rotate within {0,1,2,3} (so A's control graph connects periods + 0-3 and units 0-2 into one component, with estimable treated cells inside + it); component B = units {3,4} untreated only at periods {4,5}. A and B + share no unit or period, so they are disconnected. Target cell (unit 0, + period 4): unit 0 in A, period 4 in B -> alpha_0 + beta_4 unidentified -> + NaN. A large beta_4 makes any cross-component leak unmistakable; component + A still yields a finite ATT. + """ + rng = np.random.default_rng(0) + n_periods, tau = 6, 3.0 + alpha = np.array([0.0, 1.0, 2.0, 5.0, 6.0]) + beta = np.zeros(n_periods) + beta[4] = 20.0 + # untreated-period sets: component A rotates within {0..3} and is treated + # at {4,5}; component B is untreated only at {4,5}. + untreated = { + 0: {0, 1}, + 1: {1, 2}, + 2: {2, 3}, + 3: {4, 5}, + 4: {4, 5}, + } + rows = [] + for i in range(5): + for t in range(n_periods): + d = 0 if t in untreated[i] else 1 + rows.append( + { + "unit": i, + "period": t, + "outcome": alpha[i] + beta[t] + rng.normal(0, 0.05) + (tau if d else 0), + "treated": int(d), + } + ) + df = pd.DataFrame(rows) + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=3, + seed=1, + ) + with pytest.warns(UserWarning, match="(?i)not estimable"): + res = est.fit(df, "outcome", "treated", "unit", "period") + + # The cross-component target cell (unit 0 treated at period 4) is + # non-estimable (NaN), not a beta_4-contaminated value. + cell = res.treatment_effects.get((0, 4)) + assert cell is not None and np.isnan(cell), f"(0,4) should be NaN, got {cell}" + # Within-component A treated cells (e.g. unit 0 at period 2) stay + # estimable, so the fit still produces a finite ATT. + assert np.isfinite(res.treatment_effects.get((0, 2))) + assert np.isfinite(res.att) diff --git a/tests/test_trop.py b/tests/test_trop.py index 88642481..a7bc3cfc 100644 --- a/tests/test_trop.py +++ b/tests/test_trop.py @@ -8,6 +8,7 @@ import pandas as pd import pytest +from diff_diff import HAS_RUST_BACKEND from diff_diff.prep import generate_factor_data from diff_diff.trop import TROP, TROPResults, trop from diff_diff.trop_local import _run_trop_bootstrap_loop @@ -909,6 +910,268 @@ def test_d_matrix_validation_error_message_helpful(self): assert "absorbing state" in error_msg assert "monotonic" in error_msg.lower() or "non-decreasing" in error_msg.lower() assert "D[t, i] = 1 for all t >= first treatment" in error_msg + # Also steers genuine on/off (non-absorbing) users to the opt-in. + assert "non_absorbing" in error_msg + + @staticmethod + def _non_absorbing_df(seed=0, tau=3.0, n_units=14, n_periods=8): + """Small TWFE-clean panel with on/off (non-monotonic) treatment.""" + rng = np.random.default_rng(seed) + alpha = rng.normal(0.0, 1.0, n_units) + beta = rng.normal(0.0, 1.0, n_periods) + rows = [] + for i in range(n_units): + d = np.zeros(n_periods, dtype=int) + if i % 4 == 0 and i > 0: + d[4:6] = 1 # on then off (non-absorbing) + elif i % 3 == 0: + d[5:] = 1 # absorbing block + for t in range(n_periods): + y0 = alpha[i] + beta[t] + rng.normal(0.0, 0.05) + rows.append( + { + "unit": i, + "period": t, + "outcome": y0 + (tau if d[t] == 1 else 0.0), + "treated": int(d[t]), + } + ) + return pd.DataFrame(rows) + + @pytest.mark.slow + def test_non_absorbing_opt_in_accepted(self): + """TROP(non_absorbing=True) accepts a non-monotonic D and returns a + finite ATT instead of raising (the default still rejects -- see + test_d_matrix_absorbing_state_validation_invalid). + """ + df = self._non_absorbing_df(seed=0, tau=3.0) + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=42, + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") # caveat warning asserted elsewhere + results = est.fit(df, "outcome", "treated", "unit", "period") + assert isinstance(results, TROPResults) + assert np.isfinite(results.att) + + def test_non_absorbing_global_method_raises(self): + """non_absorbing=True is local-only; the global method must raise.""" + df = self._non_absorbing_df(seed=1) + est = TROP( + method="global", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + ) + with pytest.raises(ValueError, match="(?i)non_absorbing.*local|local.*non_absorbing"): + est.fit(df, "outcome", "treated", "unit", "period") + + def test_non_absorbing_param_round_trip_and_validation(self): + """non_absorbing round-trips through get_params/set_params and rejects + non-bool values in both __init__ and set_params. + """ + est = TROP(non_absorbing=True) + assert est.get_params()["non_absorbing"] is True + est.set_params(non_absorbing=False) + assert est.non_absorbing is False + with pytest.raises(ValueError, match="non_absorbing must be a bool"): + TROP(non_absorbing="yes") # type: ignore[arg-type] + with pytest.raises(ValueError, match="non_absorbing must be a bool"): + TROP().set_params(non_absorbing=1) # type: ignore[arg-type] + + @pytest.mark.slow + @pytest.mark.skipif(not HAS_RUST_BACKEND, reason="Rust backend not available") + def test_non_absorbing_rust_python_parity(self): + """The Rust local path is absorbing-agnostic: on a non-absorbing panel + it produces the same ATT as the forced-Python path (single-point grids + remove lambda-selection ambiguity, so only solver roundoff remains). + """ + # The package re-exports the ``trop`` function, shadowing the submodule + # attribute, so reach the modules via sys.modules (matches the idiom used + # by the other Rust-toggle tests in this file). + trop_mod = sys.modules["diff_diff.trop"] + trop_local_mod = sys.modules["diff_diff.trop_local"] + + df = self._non_absorbing_df(seed=3, tau=3.0) + kwargs = dict( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=7, + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + att_rust = TROP(**kwargs).fit(df, "outcome", "treated", "unit", "period").att + with ( + patch.object(trop_mod, "HAS_RUST_BACKEND", False), + patch.object(trop_local_mod, "HAS_RUST_BACKEND", False), + ): + att_py = TROP(**kwargs).fit(df, "outcome", "treated", "unit", "period").att + assert np.isfinite(att_rust) and np.isfinite(att_py) + np.testing.assert_allclose(att_rust, att_py, atol=1e-6, rtol=1e-6) + + def test_non_absorbing_rejects_no_observed_untreated_cells(self): + """non_absorbing identification needs OBSERVED untreated cells. An + unbalanced panel whose only D=0 cells are structural gaps (every observed + row is treated) must raise before LOOCV/default fallback, not fit on + raw-outcome residuals. Guards against the missing-cell-fill loophole. + """ + # Every observed row treated=1; ~half the (unit, period) cells dropped so + # all 4 periods still appear in the pivot and the missing cells fill to + # D=0 (with NaN outcomes). + rows = [] + for i in range(6): + for t in range(4): + if (i + t) % 2 == 0: # keep ~half -> unbalanced + rows.append( + {"unit": i, "period": t, "outcome": float(i) * 0.1 + t, "treated": 1} + ) + df = pd.DataFrame(rows) + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=1, + ) + with pytest.raises(ValueError, match="(?i)no observed untreated"): + est.fit(df, "outcome", "treated", "unit", "period") + + def test_non_absorbing_rejects_single_control_period(self): + """non_absorbing requires >=2 periods with an observed untreated cell. + A panel with exactly one such period must raise (factor-model + identifiability floor), counting only OBSERVED untreated cells. + """ + # Balanced panel, every cell treated except one observed untreated cell + # at (unit 0, period 0) -> only one period has an untreated observation. + rows = [] + for i in range(6): + for t in range(5): + treated = 0 if (i == 0 and t == 0) else 1 + rows.append( + {"unit": i, "period": t, "outcome": float(i) * 0.1 + t, "treated": treated} + ) + df = pd.DataFrame(rows) + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=1, + ) + with pytest.raises(ValueError, match="(?i)2 periods .* observed untreated"): + est.fit(df, "outcome", "treated", "unit", "period") + + @pytest.mark.slow + def test_non_absorbing_recorded_on_results(self): + """The assignment scope is persisted on TROPResults / to_dict() so a + saved result retains the non-absorbing + inference-caveat context after + the fit-time warning is gone. + """ + grid = dict( + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=2, + seed=1, + ) + df = self._non_absorbing_df(seed=0, tau=3.0) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = TROP(method="local", non_absorbing=True, **grid).fit( + df, "outcome", "treated", "unit", "period" + ) + assert res.non_absorbing is True + assert res.to_dict()["non_absorbing"] is True + + # Default (absorbing) fit records False. + abs_rows = [] + for i in range(12): + g = 4 if i < 6 else None + for t in range(8): + d = 1 if (g is not None and t >= g) else 0 + abs_rows.append( + { + "unit": i, + "period": t, + "outcome": float(i) * 0.1 + 0.2 * t + (2.0 if d else 0.0), + "treated": d, + } + ) + res_abs = TROP(method="local", **grid).fit( + pd.DataFrame(abs_rows), "outcome", "treated", "unit", "period" + ) + assert res_abs.non_absorbing is False + assert res_abs.to_dict()["non_absorbing"] is False + + @pytest.mark.slow + @pytest.mark.skipif(not HAS_RUST_BACKEND, reason="Rust backend not available") + def test_unbalanced_panel_bootstrap_uses_python_guard(self): + """On an UNBALANCED panel (default absorbing here), the point fit may be + fully estimable, yet a bootstrap resample can lose a treated cell's only + control support. The Rust bootstrap lacks the estimability guard, so the + fit must route the bootstrap to the guarded Python path whenever the panel + has missing cells -- locking the force_python condition. Balanced panels + keep the Rust happy path (covered elsewhere). + """ + rng = np.random.default_rng(5) + rows = [] + for i in range(12): + g = 4 if i < 4 else (6 if i < 8 else None) # 4 never-treated controls + for t in range(8): + d = 1 if (g is not None and t >= g) else 0 + rows.append( + { + "unit": i, + "period": t, + "outcome": float(i) * 0.1 + + 0.2 * t + + rng.normal(0, 0.05) + + (2.0 if d else 0.0), + "treated": d, + } + ) + df = pd.DataFrame(rows) + # Drop a few control rows -> unbalanced, but leave ample support so the + # point fit trims nothing (isolates the missing-cell trigger). + ctrl = df.index[df["treated"] == 0].to_numpy() + drop = rng.choice(ctrl, size=max(1, int(0.06 * len(ctrl))), replace=False) + df = df.drop(index=drop).reset_index(drop=True) + + est = TROP( + method="local", + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=3, + seed=1, + ) + trop_local_mod = sys.modules["diff_diff.trop_local"] + with patch.object(trop_local_mod, "_rust_bootstrap_trop_variance") as mock_rust: + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + res = est.fit(df, "outcome", "treated", "unit", "period") + # The Rust bootstrap must NOT be used for an unbalanced panel. + mock_rust.assert_not_called() + # The point fit itself trimmed nothing (so the trigger was the missing + # cells, not point-fit non-estimability). + assert all(np.isfinite(v) for v in res.treatment_effects.values()) + assert np.isfinite(res.att) and np.isfinite(res.se) @pytest.mark.slow @@ -3361,6 +3624,71 @@ def _make_survey_panel_and_design(): ) return df, survey_design, resolved_survey + def test_non_absorbing_rao_wu_zero_estimable_weight_is_nan_not_crash(self): + """Survey Rao-Wu bootstrap after non-estimable trimming: a draw whose + nonzero rescaled weight lands only on a skipped (non-estimable) unit + leaves the estimable treated cells with zero total weight. np.average + would raise ZeroDivisionError; the guard must return NaN for that draw so + the bootstrap stays NaN-safe (no crash) per the contract. + """ + from unittest.mock import patch + + from diff_diff import SurveyDesign + + # unit 0: always treated (non-estimable). units 1,2: treated at periods + # 4,5. units 3,4,5: never-treated controls (so periods 4,5 are NOT fully + # treated and units 1,2 have estimable cells). + rows = [] + for i in range(6): + for t in range(6): + if i == 0: + d = 1 + elif i in (1, 2): + d = 1 if t >= 4 else 0 + else: + d = 0 + rows.append( + { + "unit": i, + "period": t, + "outcome": float(i) + t + (2.0 if d else 0.0), + "treated": d, + "weight": 1.0, + "psu": i, + } + ) + df = pd.DataFrame(rows) + survey_design = SurveyDesign(weights="weight", psu="psu") + + # Per-unit Rao-Wu draw: nonzero weight only on unit 0 (always-treated, + # skipped); estimable units 1,2 get zero -> zero estimable-cell weight. + zero_estimable = np.zeros(6, dtype=np.float64) + zero_estimable[0] = 1.0 + + est = TROP( + method="local", + non_absorbing=True, + lambda_time_grid=[0.0], + lambda_unit_grid=[0.0], + lambda_nn_grid=[0.1], + n_bootstrap=3, + seed=1, + ) + with patch( + "diff_diff.bootstrap_utils.generate_rao_wu_weights", + return_value=zero_estimable, + ): + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + # Must not raise ZeroDivisionError. + res = est.fit( + df, "outcome", "treated", "unit", "period", survey_design=survey_design + ) + # Point fit (original unit weights) is estimable; bootstrap draws all + # degenerate -> SE is NaN, not a crash. + assert np.isfinite(res.att) + assert np.isnan(res.se) + def test_local_rao_wu_bootstrap_warns_above_5pct_failure(self): """Local Rao-Wu survey bootstrap: forced failures → proportional warn.""" from unittest.mock import patch