Problem
policyengine-api currently records or exposes package versions in a few places, but it does not persist and return the resolved immutable execution bundle for a simulation.
That creates three reproducibility problems:
- household calculations run against whatever country package is installed in-process, with no explicit model/data pin at execution time
- economy flows have partial
model_version plumbing but effectively no real data-version pinning
- cache keys are based on request payloads rather than the resolved execution bundle
Relevant code paths:
- household execution instantiates the installed country package
Simulation(...) directly and only reports the installed package version in metadata: policyengine_api/country.py
- economy setup hardcodes dataset version resolution to
None, builds dataset aliases like ...@None, and strips that suffix back off during setup: policyengine_api/data/model_setup.py, policyengine_api/services/economy_service.py
- the Modal adapter drops
data_version before job submission: policyengine_api/libs/simulation_api_modal.py
- cache keys are hashes of request bodies, not of the resolved bundle:
policyengine_api/utils/cache_utils.py
- the bump workflow only updates country package versions, so deployment automation also treats package version as the whole contract:
gcp/bump_country_package.py
Desired contract
For every simulation or cached result, the API should persist and return a resolved immutable bundle, not just a country ID or API version.
At minimum that bundle should include:
- orchestrator version if applicable (
policyengine.py or equivalent)
- country model package name/version
- country data package name/version
- resolved dataset artifact locator or manifest revision
- checksum or manifest ID for verification
This should apply to both household-style in-process calculations and economy/report-style asynchronous calculations.
What should change
- Resolve and persist the execution bundle at simulation creation time.
- Return that bundle in simulation metadata and user-facing API responses.
- Stop dropping
data_version or equivalent bundle identity before job submission.
- Make cache keys and dedupe keys include the resolved bundle identity.
- Update deployment/version-bump tooling so it does not treat country package version alone as the full runtime contract.
- Keep backward compatibility for existing clients where possible, but add new structured provenance fields rather than overloading the current
api_version field.
Acceptance criteria
- Household calculations and economy calculations both persist the resolved model/data bundle used at execution time.
- API responses expose structured provenance fields rather than only
api_version or country package version.
- The economy pipeline no longer hardcodes dataset version resolution to
None.
- The Modal submission path preserves the resolved bundle, including data release identity.
- Cache and dedupe keys include the resolved bundle identity so floating defaults cannot collide across releases.
- Deployment/version bump workflows can update or verify the full runtime bundle, not just the country model package version.
Upstream dependencies
This should consume the data-release contracts from:
And it should stay aligned with the orchestration work in:
Problem
policyengine-apicurrently records or exposes package versions in a few places, but it does not persist and return the resolved immutable execution bundle for a simulation.That creates three reproducibility problems:
model_versionplumbing but effectively no real data-version pinningRelevant code paths:
Simulation(...)directly and only reports the installed package version in metadata:policyengine_api/country.pyNone, builds dataset aliases like...@None, and strips that suffix back off during setup:policyengine_api/data/model_setup.py,policyengine_api/services/economy_service.pydata_versionbefore job submission:policyengine_api/libs/simulation_api_modal.pypolicyengine_api/utils/cache_utils.pygcp/bump_country_package.pyDesired contract
For every simulation or cached result, the API should persist and return a resolved immutable bundle, not just a country ID or API version.
At minimum that bundle should include:
policyengine.pyor equivalent)This should apply to both household-style in-process calculations and economy/report-style asynchronous calculations.
What should change
data_versionor equivalent bundle identity before job submission.api_versionfield.Acceptance criteria
api_versionor country package version.None.Upstream dependencies
This should consume the data-release contracts from:
And it should stay aligned with the orchestration work in:
policyengine.pythe immutable release boundary for country model and data versions policyengine.py#270