feat(car): volume-sync provider/consumer for cross-node state replication by toderian · Pull Request #399 · Ratio1/edge_node

toderian · 2026-05-04T21:45:11Z

Summary

Adds volume-sync to the Container App Runner — a provider/consumer mechanism that lets one CAR instance snapshot its app's persistent state and have other CAR instances apply that snapshot, enabling cross-node state replication for containerized apps. The feature is implemented as a self-contained sync/ subpackage and a _SyncMixin woven into ContainerAppRunnerPlugin.

Companion PR (UI side, deeploy-dapp): Ratio1/deeploy-dapp#96.

What it does

When an app deployed via CAR opts in, the runner orchestrates state replication around the container lifecycle:

Provider (the source-of-truth node)

App writes /r1en_system/volume-sync/request.json listing the container paths it wants snapshot-published.
CAR polls (SYNC.POLL_INTERVAL, default 10s), claims the request atomically, stops the container, archives the requested paths (resolving container-absolute paths to host bind-mount paths via self.volumes), uploads the bundle to R1FS, and publishes {cid, version, timestamp} to ChainStore under SYNC.KEY.
Container is restarted, app sees the response in volume-sync/response.json and the published metadata persisted under <plugin_data>/sync_history/sent/.

Consumer (a node that wants to mirror provider state)

CAR polls ChainStore (same cadence) for the same SYNC.KEY.
On a new CID (identity is CID, not version — see "design notes" below), CAR stops the container, fetches the bundle from R1FS, extracts the archive members to the matching host mount paths, writes a volume-sync/last_apply.json describing the applied state, restarts the container.
App sees the freshly populated paths on next boot.

Both sides record a full audit trail per instance under <plugin_data>/sync_history/{sent,received}/<version>__<short_cid>.json so operators can inspect any past replication event.

Always-on system volume

Independent of the SYNC role, every CAR instance now mounts a 10M ext4 loopback at /r1en_system (provisioned via the same FIXED_SIZE_VOLUMES machinery as user-declared fixed-size volumes). The mount carries the control-plane files (volume-sync/request.json, response.json, etc.). Apps that don't use SYNC simply don't write into it; the volume costs ~10MB per instance and never sees data traffic.

Env vars exported to the container:

R1_SYSTEM_VOLUME — always
R1_VOLUME_SYNC_DIR — always
R1_SYNC_REQUEST_FILE — always (so apps can write the request unconditionally; CAR just won't act on it without SYNC.ENABLED)
R1_SYNC_TYPE, R1_SYNC_KEY — only when SYNC is enabled, for apps that want to branch on role

CAR config block

Added under ContainerAppRunnerPlugin._CONFIG:

"SYNC": {
    "ENABLED": False,         # master switch — default off, no SYNC traffic
    "KEY": None,              # shared UUID across the sync set (provider + consumer)
    "TYPE": None,             # "provider" | "consumer"
    "POLL_INTERVAL": 10,      # seconds between sync ticks
}

Default-off, wire-compatible with existing pipelines.

Code layout

The whole feature lives in extensions/business/container_apps/sync/:

constants.py — file names, ChainStore hkey, manifest schema version, failure-stage labels. No code, just data — cat sync/constants.py gives the entire data-plane vocabulary in one place.
manager.py — SyncManager class. Pure I/O orchestration: claim → archive → upload → publish (provider), fetch → validate → extract → record (consumer). Delegates network/storage to owner.r1fs / owner.chainstore_* so it stays testable. ~860 lines.
mixin.py — _SyncMixin. Plugin-class integration: provisions the system volume in on_init, recovers any orphaned request.json.processing from a prior crash, drives provider/consumer ticks from _perform_periodic_monitoring. Both ticks call stop_container → manager_work → start_container inline (must NOT route through _restart_container, which unmounts the system volume before the sync slice can read from it).
__init__.py — re-exports the public surface so callers can from extensions.business.container_apps.sync import SyncManager.

Tests: extensions/business/container_apps/tests/test_sync_manager.py (866 lines) and test_sync_mixin.py (390 lines) cover happy paths, every documented failure stage, and the crash-recovery flow.

Design notes worth flagging for review

1. Identity is the CID, not the version. The consumer compares ChainStore's cid against the last applied CID. version is informational (timestamp-derived, kept for filename ordering and human-readable logs). Comparing CIDs eliminates a whole class of clock-skew failure modes — a provider with a wonky timestamp can never make a consumer permanently ignore a corrected snapshot — and makes multi-provider sync sets coherent without ordering assumptions. (Final commit 75e48a7 made this switch; earlier commits compared on version.)

2. Sync slices bypass _restart_container. _restart_container calls _cleanup_fixed_size_volumes, which would unmount the /r1en_system loopback before the sync code can read or write into it. The mixin ticks inline stop_container() → work → start_container() and never set a StopReason that would route through the restart path. This is documented at the top of mixin.py.

3. The system volume is always-on, even with SYNC disabled. The mount is cheap (10M ext4 loopback) and lets us add future CAR↔app control-plane features without re-architecting volume provisioning. The volume-sync/ subdirectory naming convention reserves space for those future features.

4. INITIAL_SYNC_TIMEOUT was removed. Earlier iterations blocked the consumer's first boot until a snapshot arrived. Final design (0882dd4) starts consumers immediately on an empty volume and lets the next tick apply whatever ChainStore has — apps that strictly require state at startup must implement their own poll-and-retry in their entrypoint. This matches how every other CAR feature degrades.

5. Permissions: /r1en_system is chmod 0o777. The system volume is per-CAR-instance and the app already owns the rest of its container; restricting the control-plane subdir to root would just mean every non-root app silently can't write the request file. See bcf3193 / 4201532 for context.

Test plan

Unit tests cover the full SyncManager and _SyncMixin surfaces (over 1250 lines of tests across the two files). For integration:

Deploy a CAR app with SYNC.ENABLED=False (or no SYNC block) → confirm /r1en_system mounts and CAR runs to completion with no SYNC log lines
Deploy a provider with SYNC.ENABLED=True, TYPE='provider', KEY=<uuid> → write a request.json from inside the container, observe stop → publish → start log sequence, confirm record appears in ChainStore under the KEY and in sync_history/sent/
Deploy a consumer on a different node with the same KEY → confirm it picks up the latest CID, stops → applies → starts, and sync_history/received/ reflects the apply
Kill a provider mid-publish (during the upload step) → confirm next tick recovers via the orphan .processing rename and republishes successfully
Roll multiple provider snapshots back-to-back → confirm consumer applies only the newest CID and skips the intermediate ones
Publish a snapshot containing paths that don't match the consumer's bind-mounts → confirm consumer fails gracefully with a stage-labeled error (no container thrash)
Restart a node mid-apply → confirm the consumer's next tick re-applies cleanly

Commits

a5bc2bd skeleton → 9cc09a4 path resolver + atomic JSON + history I/O → cc2df3e claim_request + archive helpers → 2fe9ffe orchestrators → d9948b3 mixin → ff9e73b plugin wiring → 4201532 + bcf3193 permissions → 01e7aa9 cleanup-mount fix → 75e48a7 CID identity → 4aed83f collapse into sync/ subpackage → 0882dd4 remove INITIAL_SYNC_TIMEOUT → bbe20a4 version bump.

Defines all module-level constants (system volume name/mount/size, sync file names, history dir conventions, ChainStore hkey, manifest schema version, failure stage strings) and the SyncManager class skeleton with NotImplementedError stubs for every public method called out in the plan: resolve_container_path, _write_json_atomic, history readers/writers, claim_request, make_archive, publish_snapshot, fetch_latest, validate_manifest, extract_archive, apply_snapshot, _retire_previous_cid. Module parses cleanly; no behaviour yet. Path helpers (system_volume_host_root, volume_sync_dir, history_root) derive locations from owner.get_data_folder() / _get_instance_data_subfolder() so the manager has no hardcoded plugin assumptions beyond the documented owner interface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Concrete helpers in SyncManager (no plugin lifecycle yet): resolve_container_path(): six-rule chokepoint — absolute, covered, fixed-size backed (host_root under fixed_volumes/mounts/), not the system volume, no '..' segments, resolved host path stays within host_root. _write_json_atomic(): tmp + os.replace, creates parent dir, fsyncs the payload, removes the tmp on failure so we never leak orphans. History helpers: append_sent / append_received / latest_sent / latest_received / update_history_deletion. Filenames are '<10-digit-version>__<12-char-cid>.json' so lexical sort = chronological. The 'deletion' sub-record defaults to {None,None,None} on append and is mutated in place via atomic write when a snapshot is superseded. 22 unit tests cover happy paths, rejection paths for every rule, atomic write semantics, history ordering, and missing-file deletion-update. All pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claim_request(): atomic os.replace of request.json -> .processing, JSON parse, top-level shape validation, per-path resolve_container_path check. On any failure writes request.json.invalid (with the original body or raw_body for malformed JSON) and response.json (error shape), deletes the .processing file. Returns (archive_paths, metadata) on success or None. _fail_request(): shared helper that writes the .invalid + response.json pair so the artifacts stay consistent across all failure stages. make_archive(): tarfile-based gzip with member names = container-absolute paths. Re-runs resolve_container_path on each entry as defence in depth and FileNotFoundErrors if the host path is missing. Output goes to owner.get_output_folder() with a pid+timestamp-suffixed filename. extract_archive(): two-pass — validate every member first (so unmapped members abort the entire extract before any write), then atomic per-file write via tmp + os.replace. Skips symlink/hardlink members for safety. Member names from tarfile are stripped of leading '/' by POSIX default, so we re-prepend before resolving. 19 new unit tests (41 total) cover claim happy path, all the validation failure shapes (malformed JSON, wrong type, missing/empty archive_paths, metadata not object, traversal, unmounted, non-fixed-size, system volume, links, archive round-trip, non-existent host path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…/retire) publish_snapshot(): full provider flow as four staged operations (archive_build, r1fs_upload, chainstore_publish, then history+response). Each stage has its own _fail_request shape with the matching stage string. On success: writes response.json{ok}, clears any prior request.json.invalid, deletes .processing, and best-effort retires the prior CID. The archive tmp file is always removed (success or fail) via try/finally. fetch_latest(): chainstore_hsync (non-fatal on error) + chainstore_hget. validate_manifest(): runs the manifest's archive_paths through resolve_container_path against the consumer's volumes; returns the list of unmappable entries so the caller can decide between "apply" and "skip". apply_snapshot(): pre-flight via validate_manifest, then r1fs.get_file → extract_archive → append history → write last_apply.json → retire prior CID with cleanup_local_files=True (consumer-only — drops the local R1FS download too). Failure modes are all non-fatal (no last_apply written so the consumer-side app sees nothing landed; history not advanced). _retire_previous_cid(): finds the most recent prior un-retired entry whose cid differs from the latest, calls r1fs.delete_file, updates the prior entry's deletion sub-record. Never raises — deletion failure must not roll back the new publish/apply. 20 new tests (61 total): orchestrator happy paths, every failure stage with the right artifact shape, ChainStore ack=False non-fatal, two-snapshot retirement (sender + receiver sides), failed-retirement audit trail, layout misalignment skip, end-to-end provider→consumer round-trip through shared FakeR1FS + FakeChainStore. All pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lume The mixin is the integration layer between SyncManager and the plugin lifecycle. Public surface (called by ContainerAppRunnerPlugin in the next step): * _configure_system_volume(): provisions the always-on /r1en_system fixed-size loopback (idempotent across restarts via fixed_volume.provision), tracks it in self._fixed_volumes for cleanup parity, mkdir -p's the volume-sync subdir, and binds it into self.volumes. * _inject_sync_env_vars(): R1_SYSTEM_VOLUME / R1_VOLUME_SYNC_DIR / R1_SYNC_REQUEST_FILE always set; R1_SYNC_TYPE / R1_SYNC_KEY only when SYNC.ENABLED so apps that branch on role can. * _sync_provider_tick / _sync_consumer_tick: throttled by POLL_INTERVAL. Drive stop_container -> work -> start_container INLINE (not via a StopReason -> _restart_container, which would unmount the loopback). Validation failures don't disturb the container; execution failures still restart it. * _sync_initial_consumer_block(): blocks consumer's first start_container until ChainStore has a record (bounded by INITIAL_SYNC_TIMEOUT; 0=forever). * _recover_stale_processing(): renames orphan request.json.processing back to request.json so a crash mid-publish doesn't leave a request stuck. 21 unit tests against a fake plugin that records stop/start ordering: env injection across enabled/disabled, role helpers, throttle, full provider flow with success and r1fs failure, validation failure does not stop container, full consumer flow, misalignment skip, already-applied no-op, stale .processing recovery (and don't-clobber rule when both files exist). All pass alongside the 61 sync_manager tests (82 total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Integration changes in container_app_runner.py + mixins/__init__.py: * MRO: insert _SyncMixin before _ContainerUtilsMixin so its _configure_system_volume / _inject_sync_env_vars / tick methods are available to the plugin. * _CONFIG: new SYNC block (ENABLED/KEY/TYPE/POLL_INTERVAL/ INITIAL_SYNC_TIMEOUT). System volume itself is hardcoded — no config. * __reset_vars: initialize _last_sync_check + _sync_manager. * on_init: - _configure_system_volume() after the existing _configure_*_volumes chain so /r1en_system is always provisioned; - _recover_stale_processing() so a crash mid-publish doesn't strand a request in .processing; - _validate_sync_config() (logs a clear error and disables SYNC if KEY/TYPE are bad — the system volume keeps working); - _inject_sync_env_vars() right after _setup_env_and_ports() in the non-semaphored branch. * _restart_container: same _configure_system_volume + recovery + _inject_sync_env_vars sequence so a full restart (e.g. for image update) recreates the volume + env vars cleanly. * _handle_initial_launch: - _inject_sync_env_vars() after _setup_env_and_ports in the semaphored branch; - _sync_initial_consumer_block() before the very first start_container so consumer pods boot on populated state. * _perform_additional_checks: drives the provider/consumer tick INLINE (return None — must NOT use a StopReason because that routes through _restart_container which unmounts the loopback before our work). All 82 sync-only unit tests still pass. The 10 pre-existing failures in the rest of the test_*.py suite are unrelated (test env doesn't have docker-py installed; same failure on master). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

E2E surfaced this on the first deploy: flask-files-explorer runs as a non-root user, the system volume's loop mount is root-owned (because _resolve_image_owner() returned (None, None) for this image), and the app's POST /create against /r1en_system/volume-sync/request.json got 'Permission denied'. The system volume is purpose-built as an app-writable control-plane channel between the container and CAR. There's no isolation gain in restricting the volume-sync subdir to root: the volume is per-CAR- instance, and the app already owns the container. Chmod both the mount root and the volume-sync/ subdir to 0o777 after mkdir so any container user can write requests. Existing 82 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… read E2E surfaced two more permission issues: 1. _write_json_atomic uses tempfile.mkstemp which defaults to 0o600. CAR runs as root, but the app inside the container is typically a non-root user (e.g. flask-files-explorer's appuser). After os.replace the response.json / last_apply.json / request.json.invalid files were unreadable from inside the container. Now chmod 0o666 before replace. 2. extract_archive preserves the tar member's mode, but the new file is root-owned (CAR's identity). If the source mode was something restrictive, the app couldn't read it. Now we max() the source mode against 0o644 for files and 0o755 for directories so the app can always cat / traverse what CAR landed. 82 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…_volume E2E found this: after a CAR restart, the data volume (declared via FIXED_SIZE_VOLUMES) appeared empty inside the container even though fixed_volume.provision had just chowned and mounted it. Cause: _configure_system_volume was calling fixed_volume.cleanup_stale_mounts on the shared <plugin_data>/fixed_volumes/ root. That scan iterates EVERY meta/*.json file under the root, including the appdata.json that _FixedSizeVolumesMixin._configure_fixed_size_volumes() (which runs FIRST) just provisioned. cleanup_stale_mounts saw the active mount, treated it as 'stale', and umount + losetup -d'd it. Then provision() of the system volume ran but never re-mounted appdata. The container started with an empty bind from an unmounted host path. Fix: don't repeat the stale-mount sweep — the previous configure step already did it. Add an explicit comment so a future maintainer doesn't re-introduce the call thinking it's defensive. 82 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The decision the consumer needs is content-identity: 'is this bundle the one I just applied?'. CID is a content-addressed hash that answers that exactly. Version was a CAR-side timestamp that was indirectly serving as identity, but it imported a class of failure modes (clock skew, multi-provider non-monotonic ordering) that simply don't apply to a CID comparison. Two coupled changes: * _sync_consumer_tick: skip when record.cid == latest_received().cid instead of new_version <= last_version. Different cid → apply, regardless of version metadata. * _latest_in: sort by mtime, not by filename. Filenames stay version-prefixed for chronological browsability under normal operation, but 'what did I last apply?' is fundamentally an insert-order question. Without this, a back-dated record would write a history file with a lex-smaller filename than the previous entry, latest_received would still return the older one, and the consumer would re-apply the back-dated record on every tick forever. version is kept everywhere as informational metadata (record schema + history entries + response.json + last_apply.json + filename prefixes) so wire compat is preserved and human-readable logs still say 'applying v1714742400 ...'. Only the comparison logic changed. Tests updated: * test_skips_already_applied_version → test_skips_when_record_cid_matches_last_apply * Added test_applies_when_cid_differs_even_if_version_lower (covers the clock-skew failure mode the old code couldn't survive). * test_latest_picks_highest_version → test_latest_picks_most_recently_written 83 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The volume-sync feature was scattered across two locations because the codebase has a 'mixins go in mixins/' convention while the manager is not a mixin: before: extensions/business/container_apps/ ├── sync_manager.py ← module-level constants + SyncManager └── mixins/ └── sync_mixin.py ← _SyncMixin (lifecycle integration) After this refactor the entire feature lives in one folder and a reader can scan it without jumping around: extensions/business/container_apps/ └── sync/ ├── __init__.py ← re-exports the public API ├── constants.py ← file names, hkeys, schema version, │ stage labels — pure data ├── manager.py ← SyncManager + host-side path helpers └── mixin.py ← _SyncMixin Module renames: - sync_manager.py → sync/manager.py - mixins/sync_mixin.py → sync/mixin.py - (new) → sync/constants.py (extracted from manager) - (new) → sync/__init__.py (re-exports) Import-site updates: - container_app_runner.py: pull _SyncMixin from .sync, not .mixins - mixins/__init__.py: drop _SyncMixin export (note added pointing to the new location) - tests/test_sync_manager.py and tests/test_sync_mixin.py: import from .sync (the package) rather than the deleted modules Tradeoff: this one feature breaks the 'mixins live in mixins/' convention. Worth it because everything related to sync — constants, helpers, the manager class, and the mixin — is now reachable with 'cd sync/'. The mixins/__init__.py docstring points the next reader at the new location so it's not surprising. Verification: all 83 unit tests pass (test_sync_manager + test_sync_mixin exercise the resolver, atomic JSON, history I/O, claim_request, archive roundtrip, orchestrators, env injection, ticks, recovery — all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-boot block The first-boot block solved a niche problem: an app that crashes on an empty volume (e.g. model server with missing weights) would crash-loop until the first snapshot landed, instead of cleanly waiting for state. INITIAL_SYNC_TIMEOUT held the container off until either a snapshot arrived or the timeout expired. In practice this knob earned its keep in maybe 20% of use cases and was mystifying noise in the other 80%: - Default value (600s) made first-boot consumers look 'stuck' when really nothing was wrong, just nobody had published yet. - Showed up in provider configs too, where it's silently ignored — confusing when reading a deployed pipeline JSON. - The 20% case has an obvious app-side workaround: put a poll-and- retry loop in the entrypoint, which Docker's restart policy already enables for free. New behavior on first boot of a fresh consumer: start the container immediately on an empty /app/data, the next regular tick (POLL_INTERVAL seconds later) finds the snapshot in ChainStore, applies it, and restarts. One extra container restart cycle — no crash loop unless the app explicitly requires state. Removed: - SYNC.INITIAL_SYNC_TIMEOUT field from _CONFIG.SYNC - _SyncMixin._sync_initial_timeout() helper - _SyncMixin._sync_initial_consumer_block() and its call site in container_app_runner._handle_initial_launch - 'INITIAL_SYNC_TIMEOUT' from the test_sync_mixin fake config Apps that strictly require state at startup must implement their own poll-and-retry in their entrypoint (documented in the mixin docstring). All 83 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-sync

_retire_previous_cid was sorting history files by filename, which equals version order. When the consumer applies a lower-version CID after a higher-version one (clock-skewed provider, or a multi-provider sync set with non-monotonic timestamps), the retire path would treat the higher- version entry as latest and retire the freshly-applied CID instead. Mirror the contract that _latest_in already follows (manager.py:249-272): "most recently written" is an mtime question, not a version question. Adds test_retire_uses_mtime_not_version which constructs the exact non-monotonic-version-order scenario and asserts the just-applied entry is preserved while the older-by-mtime higher-version entry gets retired. Addresses codex review finding 1 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sync ticks (_sync_provider_tick, _sync_consumer_tick) stop+start the container inline to keep the system volume mounted, deliberately bypassing _restart_container. But by bypassing it they also skipped the per-restart state-machine maintenance: container_start_time, _app_ready, _health_probe_start, _tunnel_start_allowed, _commands_started, log-stream re-attach, and BUILD_AND_RUN_COMMANDS rerun. Result: after a sync slice, tunnels stayed marked ready, health checks were skipped, image-defined startup commands didn't rerun, and log capture was stale. Extract the post-start_container reset+hooks sequence into a new helper _reset_runtime_state_post_start on ContainerAppRunnerPlugin, called from both _restart_container (replacing the inline lines) and _SyncMixin's _sync_safe_start_container. Same contract in both places. Reset is in its own try/except in the sync path so a failed reset does not roll back a successful start — the next periodic tick can re-evaluate readiness. Adds test_consumer_resets_runtime_state_after_apply which seeds the plugin with "previous container running" markers, runs a consumer tick, and asserts both the lifecycle ordering (stop → start → reset) and the post-tick marker state. Existing lifecycle assertions widened from ["stop", "start"] to ["stop", "start", "reset"] across provider+consumer tests. Addresses codex review finding 2 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

resolve_container_path iterated self.volumes and took the first bind whose prefix matched. Dict order is insertion order, not specificity. With nested fixed-size mounts like /app and /app/data, the first-match-wins logic silently resolved /app/data/foo to /app's host root — but Docker overlays /app/data on top of /app inside the container, so reads/writes go to a different host directory than what sync resolves to. Snapshots either archive stale data or extract to the wrong location. Switch to collect-then-longest: gather every mount whose bind covers the input path, then pick the one with the longest bind. Matches Docker's overlay semantics. Rule 3 / Rule 6 enforcement runs on the winner exactly as before. Three new tests: - test_longest_prefix_wins_for_nested_mounts: broader bind inserted first, deeper bind must still win - test_longest_prefix_wins_regardless_of_insertion_order: symmetric case with dict items swapped — result must be identical - test_outer_bind_still_resolves_for_paths_only_it_covers: regression guard that the broader mount continues to route paths that aren't covered by the nested one Addresses codex review finding 3 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The manifest produced by publish_snapshot stamps schema_version and archive_format (manager.py:514-520), but consumers only checked archive_paths mappability. A future provider that bumped MANIFEST_SCHEMA_VERSION or switched archive formats would be silently applied by current consumers — exactly the silent-mis-apply the schema version field exists to prevent. Extend validate_manifest to return a list of human-readable rejection reasons covering: - missing / non-int / unsupported schema_version - archive_format != ARCHIVE_FORMAT - archive_paths the consumer can't resolve Reasons are joined into a single log line in apply_snapshot, so the operator sees the full picture without grepping multiple messages. Existing validate_manifest tests updated to supply a valid schema_version + archive_format (matching what publish_snapshot actually emits); five new tests cover each rejection branch and a multi-violation case. Addresses codex review finding 4 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When fixed_volume._require_tools raises (host missing fallocate / mkfs / losetup), _configure_system_volume returned early without disabling sync or suppressing env-var injection. _inject_sync_env_vars then unconditionally advertised R1_SYSTEM_VOLUME=/r1en_system to the container, so for SYNC.ENABLED=True the app would write request.json to a path that doesn't exist on the host while CAR polled a host root that was never provisioned — a silent black hole. Track unavailability with a _sync_unavailable flag set inside the early- return branch; gate both _sync_enabled() and _inject_sync_env_vars() on it. Provider/consumer ticks short-circuit via _sync_enabled(), and the container sees no R1_* keys so apps that branch on those env vars can fall back to non-sync behavior instead of writing into a phantom mount. Two new tests: - test_no_env_when_sync_unavailable: with _sync_unavailable=True, none of the R1_* keys appear in self.env - test_sync_disabled_when_unavailable: _sync_enabled() returns False when the flag is set, even with SYNC.ENABLED=True in config Addresses codex review finding 5 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rule 3 in resolve_container_path previously hard-rejected anything whose host root didn't sit under <plugin_data>/fixed_volumes/mounts/. Legacy VOLUMES live under CONTAINER_VOLUMES_PATH (/edge_node/_local_cache/_data/container_volumes/) — per-instance host dirs at a known location, functionally equivalent to fixed-size for sync purposes, just without quota enforcement and image-owner UID resolution. The single-root substring check kept them out for no structural reason. Extend Rule 3 to a two-root allow-list: a mount qualifies if its host root contains the fixed-size marker OR sits under CONTAINER_VOLUMES_PATH. Anonymous Docker mounts and FILE_VOLUMES (whose host paths fall outside both roots) continue to be rejected with a clearer error naming both allowed roots. Rules 1/2/4/5/6 unchanged. This unblocks existing legacy-VOLUMES pipelines without forcing operators to migrate them, and as a free side-effect enables legacy ↔ fixed-size sync: because resolve_container_path keys off container paths, a legacy provider can publish a snapshot a fixed-size consumer applies cleanly (and vice versa), giving a soft-migration path. Tests: - test_rejects_non_fixed_size renamed to test_rejects_anonymous_mount; the fixture's /app/legacy bind (under tmpdir/tmpfs_legacy, outside both allow-listed roots) now stands in for anonymous mounts. - test_non_fixed_size_rejected renamed to test_anonymous_mount_rejected (claim_request integration), same fixture semantics. - test_legacy_volumes_resolves_to_host_root: new unit test on Rule 3 accepting a legacy-root-backed bind. - test_round_trip_legacy_volumes_only: provider+consumer both legacy. - test_round_trip_legacy_to_fixed_size + test_round_trip_fixed_size_to_legacy: cross-type round-trips, proving the soft-migration property. 99 of 100 sync tests pass (the 1 pre-existing failure test_applies_when_cid_differs_even_if_version_lower is unrelated and predates this branch). Implements docs/_todos/2026-05-12T16:00:00_extend_car_volume_sync_to_legacy_volumes.md in the edge_node side. E2E scenarios 11 + 12 land in project_r1_edge_node in a separate commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-sync

…ence Adds SYNC.HSYNC_POLL_INTERVAL (default 600s, min 300s) so consumers only go to the network for a fresh chain replica on a slower cadence than the overall sync tick. The cheap local-replica hget still runs on every tick. Provider behavior unchanged. Motivation: chainstore_hsync is a network round-trip to the chain cluster with a 10s timeout. Previously it fired on every consumer tick (every SYNC.POLL_INTERVAL, default 10s). When cluster peers are unreachable — e.g. after a node restart while the chain peer mesh re-establishes — each consumer tick burned ~10s waiting on a timing-out hsync. With this change the hsync attempt fires at most every 10 min by default; the hget fall-through keeps consumers reading their cached records during outages. DEBUG/DEVELOPMENT ONLY — to be removed. The HSYNC_POLL_INTERVAL config field is annotated in three places (SYNC config block, mixin helper docstring, manager.fetch_latest docstring) as a temporary operator knob. Once ChainStore propagation is reliable on devnet and prod, the cadence should become a fixed internal default and the SYNC config field should be deleted. Implementation: - new SYNC.HSYNC_POLL_INTERVAL config field (container_app_runner.py) - _SyncMixin._hsync_poll_interval() with clamp/default, exposed to SyncManager via the cfg_sync_hsync_poll_interval property (matches the existing cfg_sync_key/cfg_sync_type pattern) - SyncManager tracks _last_hsync (initial 0 so the first fetch_latest still hsyncs as a bootstrap), updates it on every attempt regardless of success so a timing-out hsync doesn't get retried on every tick Tests: - test_hsync_gated_by_interval_skips_within_window (manager) - test_hsync_fires_again_after_interval_elapses (manager) - test_hsync_failure_still_advances_timestamp (manager) - test_hsync_poll_interval_default / _floor / _invalid_falls_back (mixin) 105 of 106 sync tests pass; the 1 pre-existing failure (test_applies_when_cid_differs_even_if_version_lower) is unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a single [sync] log line on each hsync attempt (success or failure) with elapsed wall time. Previously only failures were logged; successful hsyncs were silent, which made it impossible to confirm via logs that the first hsync on plugin start actually fired. With this: - First post-init tick logs "[sync] chainstore_hsync ok (0.05s)" (green) or "[sync] chainstore_hsync error after Xs: ..." (yellow). - One log per HSYNC_POLL_INTERVAL window (default 10 min) so the volume on prod logs stays small. - Elapsed time exposes when peers are slow vs unreachable (a 9.8s "ok" is healthy-but-slow; a 10.05s "error: timed out" is the same issue we hit on devnet after the dvi-1/dvi-2 restart). Test suite unchanged (105/106 — the 1 pre-existing failure is unrelated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

What changed: - extracted CAR runtime stop cleanup so sync can stop logs, exec threads, tunnels, semaphores, and the container without unmounting fixed volumes - updated sync provider/consumer ticks to use the shared runtime stop path - made sync history latest/retirement ordering stable across deletion updates Why: - sync publish/apply needs mounted system volumes to remain available while avoiding stale runtime sidecars across restarted containers

What changed: - added SyncRequest and SyncRuntimePolicy parsing with defaults and validation - allowed online provider requests to validate container path shape without requiring a host mount - carried runtime policy through provider records and manifests Why: - sync requests need an explicit runtime policy before provider and consumer runtime behavior can branch safely

What changed: - added Docker archive based provider capture for online sync requests - kept online provider requests from stopping the running container - covered non-mounted provider paths in manager and mixin tests Why: - requesters need to sync provider-side ephemeral paths that are not backed by CAR host volumes

What changed: - added consumer apply mode branching for offline restart, online no-restart, and online restart - defaulted old records to offline_restart behavior - covered consumer runtime modes in sync mixin tests Why: - requesters need to control whether consumers restart after applying synced data

The success-side response.json was missing the app-supplied metadata, which made the in-volume-sync state incomplete: UIs that surface response.json directly (because they can't bind-mount the host-side sync_history/) could not show the tags the app attached to the snapshot. Adds a "metadata" field mirroring the request's metadata so: - the in-container view of the latest publish is self-contained - the r1-car-console "History (sent)" tab shows metadata even without R1_PLUGIN_DATA_DIR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

What changed: - Online provider capture no longer calls the restart path after publish. - Updated the provider tick regression test to assert no stop/start/reset occurs. Why: - provider_capture=online is intended to snapshot the live container without replacing it.

What changed: - Added realpath containment checks for consumer extraction targets. - Revalidate extraction destinations immediately before creating directories or files. - Added regressions for symlink directory/file escapes and failed apply state. Why: - Prevent archive extraction from writing outside the consumer volume through pre-existing symlinks.

What changed: - added a sync-local JsonControlFile helper for pending/processing mechanics - moved atomic app-readable JSON writes into a reusable helper - covered claim, recovery, cleanup, and write-failure behavior with unit tests Why: - standardize the file protocol before adding more CAR sync control files

What changed: - routed request claiming through JsonControlFile - reused the shared atomic JSON writer for sync artifacts - centralized request processing cleanup through the control-file helper Why: - keep SyncManager focused on sync validation and publish/apply behavior while sharing file mechanics

What changed: - delegated stale request processing recovery to JsonControlFile - reused the helper for provider pending-file detection - removed unused sync imports from the mixin Why: - keep lifecycle orchestration separate from low-level control-file mechanics

What changed: - read claimed JSON control files with no-follow handling - reject symlink or non-regular request processing files without raw-body leakage - added regressions for symlink request files in the helper and manager tests Why: - prevent app-writable control files from being used to read host files as CAR

What changed: - require local opt-in before provider online capture is accepted or archived - make consumer apply lifecycle mode come from local SYNC config - added regressions that provider records cannot force online consumer apply Why: - prevent app/provider metadata from forcing unsafe local lifecycle behavior

What changed: - validate manifest encryption against the published archive encryption value - constrain apply-time extraction to manifest-declared archive paths - added regressions for unsupported encryption and out-of-scope tar members Why: - prevent consumers from applying archives that do not match the advertised manifest contract

What changed: - changed consumer hsync defaults to a 60s cadence with a 10s floor - retry failed hsync attempts after a shorter failure window - clear _sync_unavailable when system volume provisioning later succeeds Why: - avoid long stale-consumer windows and recover sync availability after transient provision failures

What changed: - reject symlinked control directories and unsafe offline archive sources - strip special mode bits during consumer extraction - revalidate sync config on restart and run worker sync ticks before git checks - tighten manifest archive_paths validation and clean leaked CIDs on publish failure Why: - close PR399 security and lifecycle review blockers with regression coverage

What changed: - document SYNC config knobs and defaults - describe request, response, last_apply, and manifest contract - clarify local consumer apply policy and online provider capture opt-in Why: - make the PR399 app/operator contract reviewable from the repo docs

What changed: - keep the sync system volume root-owned and reject unsafe UTF-8 control files - require confirmed container stops before offline publish/apply and prevalidate consumer records - require positive ChainStore publish ack, restore files with target volume ownership, and keep consumer CID cleanup local - add regression coverage for the PR399 review blockers Why: - prevent unsafe control-plane ownership, live-volume mutation, unconfirmed publishes, and non-writable restored state

What changed: - document root-owned sync control-plane behavior - clarify offline stop/prevalidation, ChainStore ack, CID cleanup, and restored ownership semantics Why: - keep PR399 reviewers and CAR app authors aligned with the stricter sync behavior

What changed: - Treat successful Docker remove as a stopped container even if stop reported an error. - Mark runtime stop failures as degraded and skip fixed-volume cleanup when remove fails. - Cover the stop/remove edge cases in lifecycle tests. Why: - Sync and shutdown paths need to distinguish a removable container from one that may still be running before mutating volumes.

What changed: - Reject non-regular request control files before claim/read and open with no-follow/nonblocking flags. - Safely remove or quarantine unsafe processing entries without recursive deletion. - Make CAR-owned response/status files app-readable but not app-writable. - Let the claim path own provider request validation. Why: - App-writable control directories must not let special files block CAR or let apps rewrite CAR-owned status outputs.

What changed: - Treat ChainStore acknowledgement as the provider publish commit point. - Preserve app-visible success and clear processing files when sent-history append fails. - Skip prior CID retirement when the new history entry could not be written. - Preserve symlink-specific control-file validation diagnostics. Why: - A post-ack local history failure must not strand request.json.processing or make an already-published snapshot look unprocessed.

What changed: - Normalize consumer online apply modes to offline_restart while keeping config compatibility. - Record requested/effective apply mode and warning reason for operator visibility. - Update tests and README to reflect online consumer apply being disabled for now. Why: - Path-based extraction is not safe while the app can race filesystem paths, so consumer apply must stop the container until descriptor-safe extraction exists.

What changed: - Split consumer apply into pre-stop preparation and stopped-container commit. - Persist host-private apply state and quarantine failed CIDs with retry backoff. - Commit touched paths with backups/rollback and leave uncertain applies stopped. - Use durable applied state for dedupe while keeping app-visible last_apply.json informational. Why: - Consumers should not stop repeatedly for bad CIDs, and successful volume mutation needs a durable local state source independent of history/result-file writes.

What changed: - Track directory metadata and created parent directories as rollback operations during consumer apply commits. - Restore directory mode/ownership on failed apply and mark rollback as uncertain if metadata restoration fails. - Add regression coverage for directory metadata rollback and created parent cleanup. Why: - Failed consumer applies must not report restart_safe when directory metadata mutations were left behind.

…-sync # Conflicts: # ver.py

toderian and others added 10 commits May 3, 2026 18:40

toderian self-assigned this May 4, 2026

toderian and others added 4 commits May 5, 2026 01:00

Merge remote-tracking branch 'origin/develop' into feature/car-volume…

2d20e06

…-sync

chore: inc version

bbe20a4

toderian changed the title ~~Feature/ CAR volume sync~~ feat(car): volume-sync provider/consumer for cross-node state replication May 11, 2026

toderian and others added 14 commits May 12, 2026 12:48

Merge remote-tracking branch 'origin/develop' into feature/car-volume…

8223104

…-sync

toderian added 22 commits May 13, 2026 16:30

chore: inc version

2019215

Merge remote-tracking branch 'origin/develop' into feature/car-volume…

3c60951

…-sync # Conflicts: # ver.py

chore: version bump

fca9b6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(car): volume-sync provider/consumer for cross-node state replication#399

feat(car): volume-sync provider/consumer for cross-node state replication#399
toderian wants to merge 50 commits into
developfrom
feature/car-volume-sync

toderian commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

toderian commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Always-on system volume

CAR config block

Code layout

Design notes worth flagging for review

Test plan

Commits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

toderian commented May 4, 2026 •

edited

Loading