Summary
Analysis of the Minnesota (MN) state dataset reveals significant discrepancies from actual Minnesota statistics and a data quality issue with multi-tax-unit household distribution.
Issues Found
1. Population and Household Undercount
| Metric |
MN Dataset |
Real/Target |
Difference |
| Population |
~4.1M |
5.74M |
-29% |
| Households |
1,254,857 |
~2,344,432 |
-46% |
Sources:
2. Multi-Tax-Unit Households Concentrated in Top Income Deciles
When analyzing a CTC reform that should only affect low-income households (phases out at ~\0-40k), we found unexpected impacts in the 8th, 9th, and 10th income deciles.
Investigation revealed that affected high-income households have dramatically more tax units than unaffected ones:
| Metric |
Affected Top-Decile HH |
Unaffected Top-Decile HH |
| Avg tax units per household |
6.41 |
1.55 |
| Household count |
50,907 |
325,752 |
Distribution of tax units among affected top-decile households:
- 53% have 8 tax units
- 40% have 5 tax units
- Only 6% have 2 tax units
This is likely a bug - multi-tax-unit households should be distributed across the income spectrum, not concentrated in the top deciles.
Impact
This causes misleading results when analyzing policies that target low-income populations (like CTCs, EITC, etc.), as the impacts appear to affect wealthy households when they shouldn't.
Investigation Details
Full analysis documented in: PolicyEngine/analysis-notebooks#108
Suggested Fix
Review the household/tax-unit mapping and weighting in the state dataset calibration to ensure:
- Population and household counts match targets
- Multi-tax-unit households are distributed realistically across income deciles
Summary
Analysis of the Minnesota (MN) state dataset reveals significant discrepancies from actual Minnesota statistics and a data quality issue with multi-tax-unit household distribution.
Issues Found
1. Population and Household Undercount
Sources:
2. Multi-Tax-Unit Households Concentrated in Top Income Deciles
When analyzing a CTC reform that should only affect low-income households (phases out at ~\0-40k), we found unexpected impacts in the 8th, 9th, and 10th income deciles.
Investigation revealed that affected high-income households have dramatically more tax units than unaffected ones:
Distribution of tax units among affected top-decile households:
This is likely a bug - multi-tax-unit households should be distributed across the income spectrum, not concentrated in the top deciles.
Impact
This causes misleading results when analyzing policies that target low-income populations (like CTCs, EITC, etc.), as the impacts appear to affect wealthy households when they shouldn't.
Investigation Details
Full analysis documented in: PolicyEngine/analysis-notebooks#108
Suggested Fix
Review the household/tax-unit mapping and weighting in the state dataset calibration to ensure: