Problem
The US local-area and national H5 publishing path is still too procedural and too side-effectful. The old path spread business logic across publish_local_area.py, modal_app/local_area.py, and modal_app/worker_script.py, with repeated source-dataset setup, implicit contracts, and weakly structured worker/coordinator communication.
That made it hard to:
- reason about what one area build actually does
- reuse pieces of the publishing stack
- unit test the individual steps without spinning up large runtime surfaces
- enforce or even observe validation/failure behavior cleanly
- propagate exact package geography through H5 publishing
What We Want To Fix
Refactor the H5 publishing path into a set of clear, scoped, composable classes and contracts, in the same direction as the microplex-style architecture we discussed.
The target shape is:
- typed request/result contracts
- pure selection/reindexing/cloning steps
- a worker-scoped source snapshot
- explicit US-only augmentation services
- a builder/writer pair for one-area H5 materialization
- worker/coordinator services that communicate with structured results rather than ad hoc dict mutation
Scope
This work should cover the production Modal H5 path for:
- regional/state/district/city publishing
- national H5 publishing
It should also preserve the existing public build_h5(...) facade while moving the real work under a reusable local_h5 library surface.
Requirements
- Prefer exact geography loaded from the calibration package over seed-based regeneration.
- Reduce repeated source-dataset setup within workers.
- Make validation results explicit and testable.
- Keep fingerprint/resume logic at the adapter boundary.
- Make the one-area build stack unit-testable, with only a thin seam/integration layer above it.
Expected Deliverables
policyengine_us_data/calibration/local_h5/ package with the core services and contracts
- thin worker/coordinator adapters around that package
- targeted unit coverage for the new components
- a minimal real
build_h5(...) integration seam test
- updated docs explaining the new boundary with the legacy surface
Known Follow-Ups
- Validation policy fields beyond
enabled are still only partially implemented today and should be enforced explicitly in a later slice.
- Standalone
publish_local_area.py loop optimizations can remain separate from the production Modal path.
- Fingerprinting can eventually move to a package-geography-centric schema version once we choose to change resume semantics intentionally.
Problem
The US local-area and national H5 publishing path is still too procedural and too side-effectful. The old path spread business logic across
publish_local_area.py,modal_app/local_area.py, andmodal_app/worker_script.py, with repeated source-dataset setup, implicit contracts, and weakly structured worker/coordinator communication.That made it hard to:
What We Want To Fix
Refactor the H5 publishing path into a set of clear, scoped, composable classes and contracts, in the same direction as the microplex-style architecture we discussed.
The target shape is:
Scope
This work should cover the production Modal H5 path for:
It should also preserve the existing public
build_h5(...)facade while moving the real work under a reusablelocal_h5library surface.Requirements
Expected Deliverables
policyengine_us_data/calibration/local_h5/package with the core services and contractsbuild_h5(...)integration seam testKnown Follow-Ups
enabledare still only partially implemented today and should be enforced explicitly in a later slice.publish_local_area.pyloop optimizations can remain separate from the production Modal path.