Skip to content

Model.tracker — snapshot-managed run state (stacked on #195)#196

Merged
lmoresi merged 3 commits into
developmentfrom
feature/model-tracker
May 20, 2026
Merged

Model.tracker — snapshot-managed run state (stacked on #195)#196
lmoresi merged 3 commits into
developmentfrom
feature/model-tracker

Conversation

@lmoresi
Copy link
Copy Markdown
Member

@lmoresi lmoresi commented May 19, 2026

Summary

Adds Model.tracker — a model-dwelling, snapshot-managed record of where a run is (time, step, dt) plus any quantity the user parks on it. Everything on the tracker is automatically captured by Model.snapshot() and reverted by Model.restore(); a loose Python variable is not. Solvers do not depend on it; using it is optional.

Also adds the user-facing documentation and the back-stepping demo scripts for the whole snapshot/restore feature.

⚠️ Stacked on #195

This PR is based on feature/in-memory-checkpoint (#195), not development, so the diff shows only the tracker work (9 files, +976). It cannot merge until #195 merges. After #195 merges, retarget this PR's base to development — the diff will stay clean.

What's here

  • src/underworld3/checkpoint/tracker.pyModelTracker (attribute-style: model.tracker.foo = ... becomes managed state automatically; time/step/dt pre-seeded as ordinary conventions) + TrackerState(SnapshottableState). Zero new snapshot plumbing — auto-registers via the existing _state_bearers path from In-memory snapshot toolkit (git stash for timesteps) #195.
  • src/underworld3/model.py_tracker PrivateAttr, instantiated + registered in __init__, exposed via the tracker property.
  • tests/test_0009_model_tracker.py — 9 tier-A tests: defaults, builtins revert, user-quantity reverts, numpy-by-value deep-copy, git-stash semantics (post-snapshot quantity dropped on restore), the loose-var-vs-tracker contrast, bit-identical state roundtrip, realistic stepping-loop continuation.
  • docs/advanced/snapshot-restore.md (+ index/toctree) — user guide for the whole feature; docs-build verified.
  • tests/run_snapshot_backstepping_demo.py / _spatial.py — runnable visualisations companion to the tests.
  • tests/test_0007_… one-line robustness fix: find DDt state by type, not state_bearers[0] (a tracker is now always registered; pre-existing fragility exposed, not introduced).

Design notes for the reviewer

  • A custom __setattr__ was found (by the test suite) to silently shadow the state property setter, which would make restore() a no-op. Fixed by having __setattr__ respect class-level data descriptors; state is therefore a documented reserved name on the tracker.
  • git-stash semantics: restore replaces the managed map wholesale, so a quantity added after a snapshot is dropped on restore of that snapshot.

Test plan

  • pixi run -e amr-dev pytest tests/test_0009_model_tracker.py (9)
  • Regression incl. snapshot suite: pytest tests/test_0007_snapshot_inmemory.py tests/test_0008_snapshot_realsolver.py (27)
  • Parallel unaffected: cd tests/parallel && mpirun -np 4 python ./ptest_0007_snapshot_inmemory.py
  • pixi run -e amr-dev docs-buildadvanced/snapshot-restore.html renders, toctree resolves

Underworld development team with AI support from Claude Code

lmoresi added 3 commits May 19, 2026 16:31
Model.tracker is the authoritative *record* of where a run is —
time, step, dt, plus any quantity the user parks on it — and it is
automatically captured by Model.snapshot() and reverted by
Model.restore(). A loose Python variable (model_time = 0.0 in a
script) is not reverted; the same value on the tracker is. That
contrast is the whole point.

Design (per Louis's intent, 2026-05-19):
- Authoritative as a *record*, NOT a dependency. Solvers and DDt are
  untouched; using the tracker is optional. It sits alongside DDt's
  own _dt_history (captured independently), it does not subsume it.
- User-extensible by plain attribute assignment:
  model.tracker.foo = ... registers foo as managed state — no
  dataclass authoring, no special status in solvers.
- time/step/dt are ordinary pre-seeded managed entries (0.0/0/None),
  not privileged fields — consistent with "user-added quantities are
  first-class".
- git-stash semantics: restore replaces the managed map wholesale, so
  a quantity created after the snapshot is dropped on restore.

Implementation:
- src/underworld3/checkpoint/tracker.py: ModelTracker (uw_object
  subclass for instance_number) + TrackerState(SnapshottableState)
  carrying an open `managed` dict. Attribute routing: underscore
  names are real attributes, public names are managed entries.
  __setattr__ respects class-level data descriptors so the `state`
  property setter is honoured (without this guard restore would
  silently no-op — caught by the test suite; `state` is therefore a
  reserved name). .state getter deep-copies for isolation.
- Model: PrivateAttr _tracker, instantiated and auto-registered as a
  state-bearer in __init__; exposed via the `tracker` property.
  Zero new snapshot plumbing — the existing _state_bearers path
  picks it up.

Tests: tests/test_0009_model_tracker.py (9, tier_a level_1) —
defaults, builtins revert, user-quantity reverts, numpy-by-value
deep-copy, post-snapshot quantity dropped on restore, the
loose-var-vs-tracker contrast, bit-identical state roundtrip, and a
realistic stepping-loop continuation.

Drive-by: test_symbolic_ddt_snapshot_is_deep_copy assumed
state_bearers[0] was the DDt; with a tracker now always registered
(WeakSet, unordered) it now finds the DDt state by type. Pre-existing
fragility, exposed not introduced.

60 tests pass (24 snapshot + 3 real-solver + 9 tracker + 24
regression); parallel ptest still PASS at np 4 with the tracker
auto-registered and snapshot/restored alongside everything else.

Stacked on feature/in-memory-checkpoint (depends on its
Snapshottable/_state_bearers); PRs to development after #195 lands.

Underworld development team with AI support from Claude Code
(https://claude.com/claude-code)
User-facing advanced guide covering Model.snapshot() / Model.restore()
and Model.tracker, in the Sphinx/MyST form that builds into the
readthedocs site. Distinct from the developer state-as-dataclass
guide (that one is for people *extending* the mechanism; this is for
people *using* it).

Contents: the "stash for timesteps" mental model and when to use it
(backtrack, adaptive Δt, predictor-corrector, RK staging); the API;
what is captured automatically; the loose-variable-vs-tracker trap
and how Model.tracker solves it (with the reserved-name and
git-stash-semantics caveats); a worked adaptive-Δt CFL backtracking
loop; and an explicit guarantees/limitations section (bit-exact
discard incl. parallel and through real solvers; in-memory only;
fixed rank count; mesh-adapt refused; within-tolerance vs a
never-snapshotted solver run).

Wired into docs/advanced/index.md prose listing and the hidden
toctree. `pixi run -e amr-dev docs-build` succeeds; the page renders
to docs/_build/html/advanced/snapshot-restore.html with no
page-specific warnings and the toctree link resolves.

On feature/model-tracker because the guide documents both the
snapshot toolkit (#195) and the tracker, so it can be complete and
build against working code; lands with the tracker PR after #195.

Underworld development team with AI support from Claude Code
(https://claude.com/claude-code)
Two standalone runnable demos (tests/run_*.py convention), companions
to tests/test_0007's back-stepping test and the new advanced user
guide:

- run_snapshot_backstepping_demo.py: CFL-ratio time series. Two
  overlapping segments in the snap-back zone — the abandoned big
  step (dashed, CFL spike) and the kept substep trajectory — making
  "time is multi-valued where you stashed" visible at a glance.
- run_snapshot_backstepping_spatial.py: 2x2 spatial panels (initial /
  after bad step / after restore / after substep recovery). Top-left
  and bottom-left are visually identical — the snap-back proof.

Each writes a PNG to the cwd; the PNGs themselves are regenerable
output and are intentionally not committed.

Underworld development team with AI support from Claude Code
(https://claude.com/claude-code)
@lmoresi lmoresi changed the base branch from feature/in-memory-checkpoint to development May 20, 2026 11:36
@lmoresi lmoresi merged commit 3584be9 into development May 20, 2026
1 check passed
lmoresi added a commit that referenced this pull request May 20, 2026
First slice of the on-disk snapshot format (v1.1). Establishes the
file structure and the inspectability bar; no PETSc bulk yet (that
is phase 2). Stacked on the in-memory snapshot toolkit (#195) and
the model tracker (#196) so it can serialise both later.

What lands:
- src/underworld3/checkpoint/disk_snapshot.py
  - DISK_SNAPSHOT_SCHEMA_VERSION = 1
  - write_snapshot_skeleton(model, path): writes /metadata attrs +
    empty stub groups /mesh /variables /swarms /python_state (the
    structure phases 2+ will fill in).
  - read_snapshot_metadata(path): reads /metadata back as a plain
    dict, decodes JSON-encoded list fields for convenience, validates
    schema version.
  - inspect_snapshot(path): human-readable summary suitable for
    print(...) at a notebook prompt.
- src/underworld3/checkpoint/__init__.py: exports.
- tests/test_0010_snapshot_disk_format.py (7, tier_a level_1):
  - top-level group structure matches the spec
  - h5py-readable /metadata attrs cover identity, schema, tracker
    conventions, geometry, MPI rank count, and inventories of meshes /
    swarms / state-bearer classes / variables — the proxy for "an
    external user running h5ls/h5dump sees useful info"
  - read/write roundtrip
  - rejection of non-snapshot files and wrong-schema files with
    clear errors (not obscure h5py noise)
  - inspect_snapshot includes the key facts
  - skeleton groups carry `filled_by` attrs so phases 2/3 readers and
    external inspectors can tell whether content is populated yet.

Design notes encoded:
- UW3-controlled rich-metadata wrapper around PETSc bulk; pure PETSc
  HDF5 dumps fail the inspectability bar so are rejected as the
  format.
- List-typed metadata stored as JSON strings in scalar attrs so
  h5py / h5ls handle them cleanly; read API exposes them as plain
  Python lists alongside the *_json originals.
- Swarm storage left as a phase-3 decision: the metadata wrapper is
  designed to support `@external_file` on /swarms/swarm_X/ when
  individual swarms grow too bulky for a single file. No commitment
  to inline vs split until phase 3 has real swarm sizes in hand.

Stacked on feature/model-tracker; PRs to development after #195 and
#196 land.

Underworld development team with AI support from Claude Code
(https://claude.com/claude-code)
lmoresi added a commit that referenced this pull request May 20, 2026
…t roundtrip

Builds on phase 1's metadata wrapper to actually carry mesh + mesh-
variable state to disk and read it back. Delegates the heavy lifting
to #146's `Mesh.write_checkpoint` / `MeshVariable.read_checkpoint`
PETSc-DMPlex primitives — phase 2's job is layout, dispatch, and
tying the wrapper to the bulk data via a simple convention.

Layout (final v1.1 shape):

    /path/to/run.snap.h5          wrapper (h5py-inspectable)
    /path/to/run.snap.bulk/       companion directory (one per snap)
        {mesh_safe}.mesh.00000.h5
        {mesh_safe}.{var_clean}.00000.h5

Wrapper carries /meshes/{mesh_safe}/ with @name, @mesh_file, and
/meshes/{mesh_safe}/variables/{var_safe}/ with @name, @components,
@degree, @continuous, @external_file. The bulk-dir path is derived
from the wrapper path by convention (`.h5` → `.bulk`), so no
external_file attr is needed for the standard placement. Move them
together; a clear FileNotFoundError fires if bulk is missing on read.

Phase 1 layout refactor folded in:
- /mesh (singular) → /meshes (plural) — supports multi-mesh natively.
- /variables removed from the top level — now nests under each mesh
  as /meshes/{name}/variables/{var}, matching the in-memory
  snapshot's mesh→vars structure.

New API:
- `write_snapshot(model, path)` — writes wrapper + bulk; covers
  every registered mesh and every allocated meshvar on each mesh.
  Lazy-allocated vars (_gvec is None) are skipped — same rule as the
  in-memory path.
- `read_snapshot(model, path)` — loads var DOFs back into already-
  registered meshes by name. Mesh / variable mismatch raises a
  clear ValueError (mesh-rebuild on read is v1.2 scope).
- `write_snapshot_skeleton` / `read_snapshot_metadata` /
  `inspect_snapshot` stay as phase-1 metadata-only entry points.

Branch hygiene: merged origin/development (which now has #146) into
this branch so the new code can actually call read_checkpoint. The
merge was clean — #146 and the snapshot toolkit only overlap at
different methods in `discretisation_mesh.py`, as the earlier
analysis predicted. PR target will be development once #195/#196
land; the diff stays clean because the merged dev commits are
already there.

Tests (12 total, 5 new in phase 2, tier_a level_1):
- write produces wrapper + bulk-dir with the expected file pattern
- wrapper populated with the per-mesh + per-var metadata that makes
  inspectability self-sufficient
- bit-exact write→scribble→read roundtrip on a 2D mesh with one
  scalar + one vector variable (np.array_equal, zero tolerance)
- missing bulk-dir → clear FileNotFoundError
- mismatched mesh on read → clear ValueError (not an obscure h5py
  trace)

Regression: 64 tests pass (24 snapshot + 9 tracker + 12 disk-format
+ 19 core/regression).

Phase 3 next: swarms (with the @external_file freedom kept open for
bulky swarms) + /python_state for DDt + ModelTracker via dataclass-
to-HDF5-attrs serialisation.

Underworld development team with AI support from Claude Code
(https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant