Model.tracker — snapshot-managed run state (stacked on #195) by lmoresi · Pull Request #196 · underworldcode/underworld3

lmoresi · 2026-05-19T12:21:55Z

Summary

Adds Model.tracker — a model-dwelling, snapshot-managed record of where a run is (time, step, dt) plus any quantity the user parks on it. Everything on the tracker is automatically captured by Model.snapshot() and reverted by Model.restore(); a loose Python variable is not. Solvers do not depend on it; using it is optional.

Also adds the user-facing documentation and the back-stepping demo scripts for the whole snapshot/restore feature.

⚠️ Stacked on #195

This PR is based on feature/in-memory-checkpoint (#195), not development, so the diff shows only the tracker work (9 files, +976). It cannot merge until #195 merges. After #195 merges, retarget this PR's base to development — the diff will stay clean.

What's here

src/underworld3/checkpoint/tracker.py — ModelTracker (attribute-style: model.tracker.foo = ... becomes managed state automatically; time/step/dt pre-seeded as ordinary conventions) + TrackerState(SnapshottableState). Zero new snapshot plumbing — auto-registers via the existing _state_bearers path from In-memory snapshot toolkit (git stash for timesteps) #195.
src/underworld3/model.py — _tracker PrivateAttr, instantiated + registered in __init__, exposed via the tracker property.
tests/test_0009_model_tracker.py — 9 tier-A tests: defaults, builtins revert, user-quantity reverts, numpy-by-value deep-copy, git-stash semantics (post-snapshot quantity dropped on restore), the loose-var-vs-tracker contrast, bit-identical state roundtrip, realistic stepping-loop continuation.
docs/advanced/snapshot-restore.md (+ index/toctree) — user guide for the whole feature; docs-build verified.
tests/run_snapshot_backstepping_demo.py / _spatial.py — runnable visualisations companion to the tests.
tests/test_0007_… one-line robustness fix: find DDt state by type, not state_bearers[0] (a tracker is now always registered; pre-existing fragility exposed, not introduced).

Design notes for the reviewer

A custom __setattr__ was found (by the test suite) to silently shadow the state property setter, which would make restore() a no-op. Fixed by having __setattr__ respect class-level data descriptors; state is therefore a documented reserved name on the tracker.
git-stash semantics: restore replaces the managed map wholesale, so a quantity added after a snapshot is dropped on restore of that snapshot.

Test plan

pixi run -e amr-dev pytest tests/test_0009_model_tracker.py (9)
Regression incl. snapshot suite: pytest tests/test_0007_snapshot_inmemory.py tests/test_0008_snapshot_realsolver.py (27)
Parallel unaffected: cd tests/parallel && mpirun -np 4 python ./ptest_0007_snapshot_inmemory.py
pixi run -e amr-dev docs-build — advanced/snapshot-restore.html renders, toctree resolves

Underworld development team with AI support from Claude Code

Model.tracker is the authoritative *record* of where a run is — time, step, dt, plus any quantity the user parks on it — and it is automatically captured by Model.snapshot() and reverted by Model.restore(). A loose Python variable (model_time = 0.0 in a script) is not reverted; the same value on the tracker is. That contrast is the whole point. Design (per Louis's intent, 2026-05-19): - Authoritative as a *record*, NOT a dependency. Solvers and DDt are untouched; using the tracker is optional. It sits alongside DDt's own _dt_history (captured independently), it does not subsume it. - User-extensible by plain attribute assignment: model.tracker.foo = ... registers foo as managed state — no dataclass authoring, no special status in solvers. - time/step/dt are ordinary pre-seeded managed entries (0.0/0/None), not privileged fields — consistent with "user-added quantities are first-class". - git-stash semantics: restore replaces the managed map wholesale, so a quantity created after the snapshot is dropped on restore. Implementation: - src/underworld3/checkpoint/tracker.py: ModelTracker (uw_object subclass for instance_number) + TrackerState(SnapshottableState) carrying an open `managed` dict. Attribute routing: underscore names are real attributes, public names are managed entries. __setattr__ respects class-level data descriptors so the `state` property setter is honoured (without this guard restore would silently no-op — caught by the test suite; `state` is therefore a reserved name). .state getter deep-copies for isolation. - Model: PrivateAttr _tracker, instantiated and auto-registered as a state-bearer in __init__; exposed via the `tracker` property. Zero new snapshot plumbing — the existing _state_bearers path picks it up. Tests: tests/test_0009_model_tracker.py (9, tier_a level_1) — defaults, builtins revert, user-quantity reverts, numpy-by-value deep-copy, post-snapshot quantity dropped on restore, the loose-var-vs-tracker contrast, bit-identical state roundtrip, and a realistic stepping-loop continuation. Drive-by: test_symbolic_ddt_snapshot_is_deep_copy assumed state_bearers[0] was the DDt; with a tracker now always registered (WeakSet, unordered) it now finds the DDt state by type. Pre-existing fragility, exposed not introduced. 60 tests pass (24 snapshot + 3 real-solver + 9 tracker + 24 regression); parallel ptest still PASS at np 4 with the tracker auto-registered and snapshot/restored alongside everything else. Stacked on feature/in-memory-checkpoint (depends on its Snapshottable/_state_bearers); PRs to development after #195 lands. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)

User-facing advanced guide covering Model.snapshot() / Model.restore() and Model.tracker, in the Sphinx/MyST form that builds into the readthedocs site. Distinct from the developer state-as-dataclass guide (that one is for people *extending* the mechanism; this is for people *using* it). Contents: the "stash for timesteps" mental model and when to use it (backtrack, adaptive Δt, predictor-corrector, RK staging); the API; what is captured automatically; the loose-variable-vs-tracker trap and how Model.tracker solves it (with the reserved-name and git-stash-semantics caveats); a worked adaptive-Δt CFL backtracking loop; and an explicit guarantees/limitations section (bit-exact discard incl. parallel and through real solvers; in-memory only; fixed rank count; mesh-adapt refused; within-tolerance vs a never-snapshotted solver run). Wired into docs/advanced/index.md prose listing and the hidden toctree. `pixi run -e amr-dev docs-build` succeeds; the page renders to docs/_build/html/advanced/snapshot-restore.html with no page-specific warnings and the toctree link resolves. On feature/model-tracker because the guide documents both the snapshot toolkit (#195) and the tracker, so it can be complete and build against working code; lands with the tracker PR after #195. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)

Two standalone runnable demos (tests/run_*.py convention), companions to tests/test_0007's back-stepping test and the new advanced user guide: - run_snapshot_backstepping_demo.py: CFL-ratio time series. Two overlapping segments in the snap-back zone — the abandoned big step (dashed, CFL spike) and the kept substep trajectory — making "time is multi-valued where you stashed" visible at a glance. - run_snapshot_backstepping_spatial.py: 2x2 spatial panels (initial / after bad step / after restore / after substep recovery). Top-left and bottom-left are visually identical — the snap-back proof. Each writes a PNG to the cwd; the PNGs themselves are regenerable output and are intentionally not committed. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)

First slice of the on-disk snapshot format (v1.1). Establishes the file structure and the inspectability bar; no PETSc bulk yet (that is phase 2). Stacked on the in-memory snapshot toolkit (#195) and the model tracker (#196) so it can serialise both later. What lands: - src/underworld3/checkpoint/disk_snapshot.py - DISK_SNAPSHOT_SCHEMA_VERSION = 1 - write_snapshot_skeleton(model, path): writes /metadata attrs + empty stub groups /mesh /variables /swarms /python_state (the structure phases 2+ will fill in). - read_snapshot_metadata(path): reads /metadata back as a plain dict, decodes JSON-encoded list fields for convenience, validates schema version. - inspect_snapshot(path): human-readable summary suitable for print(...) at a notebook prompt. - src/underworld3/checkpoint/__init__.py: exports. - tests/test_0010_snapshot_disk_format.py (7, tier_a level_1): - top-level group structure matches the spec - h5py-readable /metadata attrs cover identity, schema, tracker conventions, geometry, MPI rank count, and inventories of meshes / swarms / state-bearer classes / variables — the proxy for "an external user running h5ls/h5dump sees useful info" - read/write roundtrip - rejection of non-snapshot files and wrong-schema files with clear errors (not obscure h5py noise) - inspect_snapshot includes the key facts - skeleton groups carry `filled_by` attrs so phases 2/3 readers and external inspectors can tell whether content is populated yet. Design notes encoded: - UW3-controlled rich-metadata wrapper around PETSc bulk; pure PETSc HDF5 dumps fail the inspectability bar so are rejected as the format. - List-typed metadata stored as JSON strings in scalar attrs so h5py / h5ls handle them cleanly; read API exposes them as plain Python lists alongside the *_json originals. - Swarm storage left as a phase-3 decision: the metadata wrapper is designed to support `@external_file` on /swarms/swarm_X/ when individual swarms grow too bulky for a single file. No commitment to inline vs split until phase 3 has real swarm sizes in hand. Stacked on feature/model-tracker; PRs to development after #195 and #196 land. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)

@name

…t roundtrip Builds on phase 1's metadata wrapper to actually carry mesh + mesh- variable state to disk and read it back. Delegates the heavy lifting to #146's `Mesh.write_checkpoint` / `MeshVariable.read_checkpoint` PETSc-DMPlex primitives — phase 2's job is layout, dispatch, and tying the wrapper to the bulk data via a simple convention. Layout (final v1.1 shape): /path/to/run.snap.h5 wrapper (h5py-inspectable) /path/to/run.snap.bulk/ companion directory (one per snap) {mesh_safe}.mesh.00000.h5 {mesh_safe}.{var_clean}.00000.h5 Wrapper carries /meshes/{mesh_safe}/ with @name, @mesh_file, and /meshes/{mesh_safe}/variables/{var_safe}/ with @name, @components, @degree, @continuous, @external_file. The bulk-dir path is derived from the wrapper path by convention (`.h5` → `.bulk`), so no external_file attr is needed for the standard placement. Move them together; a clear FileNotFoundError fires if bulk is missing on read. Phase 1 layout refactor folded in: - /mesh (singular) → /meshes (plural) — supports multi-mesh natively. - /variables removed from the top level — now nests under each mesh as /meshes/{name}/variables/{var}, matching the in-memory snapshot's mesh→vars structure. New API: - `write_snapshot(model, path)` — writes wrapper + bulk; covers every registered mesh and every allocated meshvar on each mesh. Lazy-allocated vars (_gvec is None) are skipped — same rule as the in-memory path. - `read_snapshot(model, path)` — loads var DOFs back into already- registered meshes by name. Mesh / variable mismatch raises a clear ValueError (mesh-rebuild on read is v1.2 scope). - `write_snapshot_skeleton` / `read_snapshot_metadata` / `inspect_snapshot` stay as phase-1 metadata-only entry points. Branch hygiene: merged origin/development (which now has #146) into this branch so the new code can actually call read_checkpoint. The merge was clean — #146 and the snapshot toolkit only overlap at different methods in `discretisation_mesh.py`, as the earlier analysis predicted. PR target will be development once #195/#196 land; the diff stays clean because the merged dev commits are already there. Tests (12 total, 5 new in phase 2, tier_a level_1): - write produces wrapper + bulk-dir with the expected file pattern - wrapper populated with the per-mesh + per-var metadata that makes inspectability self-sufficient - bit-exact write→scribble→read roundtrip on a 2D mesh with one scalar + one vector variable (np.array_equal, zero tolerance) - missing bulk-dir → clear FileNotFoundError - mismatched mesh on read → clear ValueError (not an obscure h5py trace) Regression: 64 tests pass (24 snapshot + 9 tracker + 12 disk-format + 19 core/regression). Phase 3 next: swarms (with the @external_file freedom kept open for bulky swarms) + /python_state for DDt + ModelTracker via dataclass- to-HDF5-attrs serialisation. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)

lmoresi added 3 commits May 19, 2026 16:31

lmoresi mentioned this pull request May 20, 2026

On-disk snapshot toolkit v1.1 (stacked on #195, #196) #198

Merged

3 tasks

lmoresi changed the base branch from feature/in-memory-checkpoint to development May 20, 2026 11:36

lmoresi merged commit 3584be9 into development May 20, 2026
1 check passed

lmoresi mentioned this pull request May 20, 2026

docs: snapshot toolkit — CHANGES entry + current API + toctree #199

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model.tracker — snapshot-managed run state (stacked on #195)#196

Model.tracker — snapshot-managed run state (stacked on #195)#196
lmoresi merged 3 commits into
developmentfrom
feature/model-tracker

lmoresi commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lmoresi commented May 19, 2026

Summary

⚠️ Stacked on #195

What's here

Design notes for the reviewer

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant