Impute below-threshold student loan holders#332
Conversation
vahid-ahmadi
left a comment
There was a problem hiding this comment.
Entity-level mismatch bug in compute functions
Both compute_student_loan_plan and compute_student_loan_plan_liable in other.py mix person-level and household-level arrays:
plan = ctx.pe_person("student_loan_plan") # person-level (n_persons)
repayments = ctx.pe_person("student_loan_repayments") # person-level (n_persons)
on_plan = (plan == plan_value) & (ctx.country == "ENGLAND") & (repayments > 0)ctx.country is sim.calculate("country").values which returns a household-level array (confirmed: country entity is household). This will fail at runtime with real data due to array length mismatch.
The old code handled this correctly by explicitly mapping region to person level:
region = ctx.sim.calculate("region", map_to="person").valuesThe test doesn't catch this because DummyCtx uses same-length arrays for everything and household_from_person is identity.
Fix: replace ctx.country == "ENGLAND" with something like ctx.sim.calculate("country", map_to="person").values == "ENGLAND" in both functions. Note that the imputation code in student_loans.py already does this correctly (sim.calculate("country", map_to="person").values).
Minor: Plan 2 below-threshold eligible pool is broader than Plan 2 cohort
plan_2_eligible uses plan_2_age_band (ages 21–55) but doesn't require the Plan 2 cohort filter (uni start >= 2012). In 2025, anyone aged ~31+ started uni pre-2012 and would be Plan 1 cohort, yet they're in the Plan 2 eligible pool. This likely doesn't affect the total count (the probabilistic assignment still targets the right number), but it distributes Plan 2 holders across a wider age range than reality. Worth considering whether to tighten the age band or add the cohort filter.
|
Addressed the review in 21b2243. The fix switches the SLC compute path back to person-level mapping instead of the household-level , and it tightens the Plan 2 below-threshold imputation pool to the estimated Plan 2 cohort as well as the age band. Local validation after the change: E902 No such file or directory (os error 2) Found 1 error. and ============================= test session starts ============================== policyengine_uk_data/tests/test_student_loan_targets.py ...... [ 54%] ============================== 11 passed in 9.23s ============================== (). |
|
Addressed the review in Changes:
Local validation after the change:
|
|
Follow-up: a post-merge review found that the SLC parser is still dropping literal zero target values, so the Plan 5 2025 above-threshold zero is not actually enforced in calibration. I’m fixing that in a small follow-up PR now. |
Summary
Higher education totalrows instead of the first matching above-threshold rowDetails
This keeps the active-repayer stock anchored to observed PAYE deductions, then fills the missing England tertiary-education borrower stock needed for the base dataset and future plan uprating.
The checked-in SLC snapshot implies these missing borrower stocks before weighting against the live dataset:
4.955min 2025, rising to5.57min 202810kin 2025, rising to2.165min 2030Testing
uvx ruff check policyengine_uk_data/targets/sources/slc.py policyengine_uk_data/datasets/imputations/student_loans.py policyengine_uk_data/targets/compute/other.py policyengine_uk_data/targets/compute/__init__.py policyengine_uk_data/targets/build_loss_matrix.py policyengine_uk_data/tests/test_student_loan_targets.py policyengine_uk_data/tests/test_student_loan_plan.pyuv run --python 3.13 pytest -q policyengine_uk_data/tests/test_student_loan_targets.py policyengine_uk_data/tests/test_student_loan_plan.pyCloses #281.