Skip to content

Refactor H5 publishing into local_h5 services#723

Draft
anth-volk wants to merge 21 commits intomainfrom
fix/target-architecture-h5
Draft

Refactor H5 publishing into local_h5 services#723
anth-volk wants to merge 21 commits intomainfrom
fix/target-architecture-h5

Conversation

@anth-volk
Copy link
Copy Markdown
Collaborator

Summary

Refs #722.

This refactors the production US local-area and national H5 publishing path into a testable local_h5 library surface with thin adapters around it.

What Changed

  • added policyengine_us_data/calibration/local_h5/ with:
    • typed contracts
    • package geography loading
    • fingerprint service
    • work partitioning
    • selection / weight layout
    • source snapshot loading
    • entity reindexing
    • variable cloning
    • US-specific augmentation
    • builder / writer
    • worker session / worker service
    • area catalog
  • turned build_h5(...) into a compatibility facade over the new builder/writer path
  • refactored Modal worker/coordinator paths to use structured requests/results
  • propagate exact geography from calibration_package.pkl instead of regenerating from seed in the production Modal path
  • made validation results explicit and surfaced validation diagnostics in pipeline outputs
  • tightened fingerprint/resume handling and canonicalized clone count from weights
  • documented the landed architecture in docs/internals/local_h5_refactor_status.md

Test Coverage

Added targeted coverage for the new components and adapter seams, including:

  • unit tests for contracts, partitioning, package geography, fingerprinting, selection, source snapshot, reindexing, variable cloning, US augmentations, builder, writer, worker service, area catalog, resilience, and coordinator contracts
  • pipeline diagnostics write coverage in tests/unit/test_pipeline_validation_diagnostics.py
  • a minimal real build_h5(...) integration seam test in tests/integration/test_build_h5_minimal.py

Recent focused runs:

./.venv-test/bin/pytest --noconftest \
  tests/unit/test_pipeline_validation_diagnostics.py \
  tests/integration/test_build_h5_minimal.py -q

Boundary Notes

This PR intentionally changes a few adapter-layer behaviors:

  • Modal H5 publishing now requires calibration_package.pkl so production publishing uses exact package geography
  • fingerprint.json is now a structured record, though it still preserves the digest field
  • validation diagnostics now include validation_errors.json in addition to the regional CSV / national text output

The old standalone publish_local_area.py loops remain mostly intact as a non-production boundary. The one-area build path beneath them is refactored; the outer standalone iteration/checkpoint/upload flow is largely left alone.

Follow-Ups

  • ValidationPolicy is only partially enforced today; only enabled is fully wired
  • standalone loop snapshot reuse was intentionally left out of scope
  • fingerprinting can later move to a package-geography-centric schema once we intentionally change resume semantics

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pipeline-diagrams Error Error Apr 11, 2026 0:42am

Request Review

@anth-volk anth-volk force-pushed the fix/target-architecture-h5 branch from 5ebed95 to 6ab9cef Compare April 11, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant