Skip to content

Fix PUF clone prior weights#1140

Merged
MaxGhenis merged 1 commit into
mainfrom
fix-puf-clone-weight-priors
May 26, 2026
Merged

Fix PUF clone prior weights#1140
MaxGhenis merged 1 commit into
mainfrom
fix-puf-clone-weight-priors

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented May 26, 2026

Summary

  • give zero-weight PUF clone households meaningful uniform prior mass instead of near-zero epsilon priors
  • add legacy SOI Table 1.4A long-term capital gains amount/count targets by AGI bin
  • map positive LTCG into legacy SOI target construction and use the LTCG aggregate uprating path for capital gains SOI rows

Tests

  • env -u UV_FROZEN uv run ruff check policyengine_us_data/datasets/cps/enhanced_cps.py policyengine_us_data/utils/loss.py policyengine_us_data/utils/soi.py tests/unit/test_enhanced_cps_clone_diagnostics.py tests/unit/datasets/test_enhanced_cps_seeding.py tests/unit/calibration/test_loss_targets.py tests/unit/test_soi_utils.py
  • env -u UV_FROZEN uv run pytest tests/unit/test_enhanced_cps_clone_diagnostics.py tests/unit/datasets/test_enhanced_cps_seeding.py tests/unit/calibration/test_loss_targets.py tests/unit/test_soi_utils.py tests/unit/test_refresh_soi_table_targets.py

Pending

  • draft until the local ECPS rebuild/calibration diagnostic is rerun with the block-CD distribution storage artifact available

Refs #1139

@MaxGhenis MaxGhenis force-pushed the fix-puf-clone-weight-priors branch from 1af6b9e to 40cfdfd Compare May 26, 2026 01:14
@MaxGhenis MaxGhenis changed the base branch from add-cg-basis-soca-imputation to main May 26, 2026 01:15
@MaxGhenis MaxGhenis marked this pull request as ready for review May 26, 2026 10:28
@MaxGhenis MaxGhenis merged commit 382323e into main May 26, 2026
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

MaxGhenis commented May 26, 2026

Post-merge local calibration impact from the 2024 rebuild diagnostic (ltcg_agi_targets_main, 1,000 epochs):

  • Across 3,539 common finite targets, median relative absolute error improved from 9.64% to 5.64%; the share within 10% improved from 51.1% to 68.6%.
  • Mean objective fell from 891.70 to 2.46; median objective fell from 0.00843 to 0.00318.
  • 1,210 / 3,539 common finite targets worsened (34.2%), even though the aggregate fit improved. The largest worseners were concentrated in high-AGI business/partnership targets and a few demographic/geographic targets: e.g. partnership and S-corp income in AGI $79M+ (299% -> 1,810%), negative_household_market_income_total (609% -> 968%), partnership and S-corp income in AGI $16M-$79M (57% -> 363%), business net profits in AGI $16M-$79M (9% -> 274%), NY AGI count $1M+ (94% -> 203%), and SOI filer count AGI $10M+ (17% -> 91%).
  • The PUF clone half now receives material calibrated mass instead of being effectively unused. In the local run the final split was about 74.6M CPS / 91.3M PUF-clone household weight, or 55.0% clone share.
  • Capital-gains fit improved enough to make this worth merging, but the top tail is still the main residual issue: summed SOI LTCG AGI-bin amount is about $1.83T vs $1.26T target (1.45x), and the $10M+ LTCG amount is $1.459T vs $450.5B target (3.24x). The $10M+ LTCG count is much closer: 30.4k vs 29.0k (1.05x).

Interpretation: this fixes the prior-weight bug and materially improves the broad calibration surface, but it does not solve the high-end gains amount problem. The next follow-up should focus on the Forbes/PUF top-tail construction and whether those records should be explicit ECPS rows rather than only entering through PUF-derived imputation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant