Skip to content

Add ESA and JSA calibration targets#325

Merged
MaxGhenis merged 1 commit intomainfrom
codex/uk-data-esa-targets
Apr 12, 2026
Merged

Add ESA and JSA calibration targets#325
MaxGhenis merged 1 commit intomainfrom
codex/uk-data-esa-targets

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • add DWP claimant-count targets for ESA total, ESA contributory, ESA income-related, and current JSA claims
  • fix the OBR Jobseeker's Allowance target to map to total jsa rather than only jsa_income
  • add focused tests for the new DWP targets and the OBR JSA mapping

Why

The current calibration target set barely constrains ESA/JSA. ESA only has a coarse OBR spending anchor, and JSA spending was incorrectly matched to jsa_income only. That made the ESA/JSA asset-rule integration unstable under recalibration.

Verification

  • uv run pytest -q policyengine_uk_data/tests/test_esa_jsa_targets.py
  • uv run pytest -q policyengine_uk_data/tests/test_target_registry.py
  • uvx ruff check policyengine_uk_data/targets/sources/dwp.py policyengine_uk_data/targets/sources/obr.py policyengine_uk_data/tests/test_esa_jsa_targets.py

@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Update after re-running the calibration comparison more carefully.

The earlier optimistic ~18% national mean absolute relative error result was not a fair apples-to-apples comparison. I re-ran this as a matched admin-only benchmark using the same expanded target set, the same constituency calibration harness, and 5 matched random seeds for:

  • current main
  • all_assets = UC + HB + IS + PC + ESA/JSA asset-rule bundle

Post-constituency-recalibration summary across 5 seeds:

Model National mean abs rel error National median abs rel error National within 10% National within 20% Constituency-local median abs rel error Constituency-local within 10%
current_main 69.0% median (68.4% to 69.2%) 18.4% 36.4% 52.2% 5.65% 89.9%
all_assets 61.6% median (61.3% to 61.9%) 18.7% 36.6% 50.9% 5.71% 89.4%

So the stable result is not “clear across-the-board improvement”. The full asset-rule bundle does consistently improve national mean error by about 7.1pp, but it is roughly flat to slightly worse on the other national metrics and on constituency-local fit.

Two concrete takeaways:

  1. The comparison needs to be treated as stochastic. Single calibration draws are not trustworthy here.
  2. The more important next step is probably a target audit / target-family breakdown, not more one-off recalibration runs. We already found one real mapping issue on JSA, and the current objective seems able to trade off target families in ways that are not obvious from one aggregate loss number.

@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Follow-up on the calibration/target side after rerunning the fair admin-only comparison.

  1. Corrected calibration read

With the new ESA/JSA target additions in this PR, the fair comparison is not "everything gets dramatically better". Over 5 matched fresh constituency-recalibration seeds:

  • current_main median national mean absolute relative error: 69.0%
  • all_assets median national mean absolute relative error: 61.6%
  • stable delta on that metric: about -7.1pp

But the rest of the metrics are mixed:

  • national median absolute relative error: slightly worse
  • national within 10%: slightly better
  • national within 20%: worse
  • constituency-local fit: slightly worse

So the cleaner conclusion is that the expanded asset-rule package improves national mean admin-target error, but not the whole loss surface.

  1. Target freshness audit

I also checked the current source dates against official releases. There are newer admin targets available for several target families:

  • OBR welfare/tax-benefit spend targets in tax_benefit.csv still point at obr_march_2024_efo, but OBR has newer March 2026 detailed forecast tables published on 3 March 2026.
  • dwp.py benefit cap targets still use ...to-february-2025, but DWP has a newer official release Benefit cap: number of households capped to November 2025.
  • dwp.py two-child-limit targets still reference the April 2024 publication, but DWP has a newer official release Universal Credit claimants statistics on the two child limit policy, April 2025.
  • local_uc.py still uses country-level UC-by-children proportions from November 2023, while the UC statistics collection now runs through 12 February 2026 on Stat-Xplore, so those splits can be refreshed.

The new ESA/JSA count targets added here are different: they are already anchored to the current DWP benefit statistics release for February 2026, so I do not think they are the stale part.

@MaxGhenis MaxGhenis merged commit cdeb599 into main Apr 12, 2026
3 checks passed
@MaxGhenis MaxGhenis deleted the codex/uk-data-esa-targets branch April 12, 2026 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant