feat(car): volume-sync provider/consumer for cross-node state replication#399
Open
toderian wants to merge 50 commits into
Open
feat(car): volume-sync provider/consumer for cross-node state replication#399toderian wants to merge 50 commits into
toderian wants to merge 50 commits into
Conversation
Defines all module-level constants (system volume name/mount/size, sync file names, history dir conventions, ChainStore hkey, manifest schema version, failure stage strings) and the SyncManager class skeleton with NotImplementedError stubs for every public method called out in the plan: resolve_container_path, _write_json_atomic, history readers/writers, claim_request, make_archive, publish_snapshot, fetch_latest, validate_manifest, extract_archive, apply_snapshot, _retire_previous_cid. Module parses cleanly; no behaviour yet. Path helpers (system_volume_host_root, volume_sync_dir, history_root) derive locations from owner.get_data_folder() / _get_instance_data_subfolder() so the manager has no hardcoded plugin assumptions beyond the documented owner interface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Concrete helpers in SyncManager (no plugin lifecycle yet):
resolve_container_path(): six-rule chokepoint — absolute, covered, fixed-size
backed (host_root under fixed_volumes/mounts/), not the system volume,
no '..' segments, resolved host path stays within host_root.
_write_json_atomic(): tmp + os.replace, creates parent dir, fsyncs the
payload, removes the tmp on failure so we never leak orphans.
History helpers: append_sent / append_received / latest_sent /
latest_received / update_history_deletion. Filenames are
'<10-digit-version>__<12-char-cid>.json' so lexical sort = chronological.
The 'deletion' sub-record defaults to {None,None,None} on append and is
mutated in place via atomic write when a snapshot is superseded.
22 unit tests cover happy paths, rejection paths for every rule, atomic
write semantics, history ordering, and missing-file deletion-update.
All pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claim_request(): atomic os.replace of request.json -> .processing, JSON parse, top-level shape validation, per-path resolve_container_path check. On any failure writes request.json.invalid (with the original body or raw_body for malformed JSON) and response.json (error shape), deletes the .processing file. Returns (archive_paths, metadata) on success or None. _fail_request(): shared helper that writes the .invalid + response.json pair so the artifacts stay consistent across all failure stages. make_archive(): tarfile-based gzip with member names = container-absolute paths. Re-runs resolve_container_path on each entry as defence in depth and FileNotFoundErrors if the host path is missing. Output goes to owner.get_output_folder() with a pid+timestamp-suffixed filename. extract_archive(): two-pass — validate every member first (so unmapped members abort the entire extract before any write), then atomic per-file write via tmp + os.replace. Skips symlink/hardlink members for safety. Member names from tarfile are stripped of leading '/' by POSIX default, so we re-prepend before resolving. 19 new unit tests (41 total) cover claim happy path, all the validation failure shapes (malformed JSON, wrong type, missing/empty archive_paths, metadata not object, traversal, unmounted, non-fixed-size, system volume, links, archive round-trip, non-existent host path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/retire)
publish_snapshot(): full provider flow as four staged operations
(archive_build, r1fs_upload, chainstore_publish, then history+response).
Each stage has its own _fail_request shape with the matching stage string.
On success: writes response.json{ok}, clears any prior request.json.invalid,
deletes .processing, and best-effort retires the prior CID. The archive
tmp file is always removed (success or fail) via try/finally.
fetch_latest(): chainstore_hsync (non-fatal on error) + chainstore_hget.
validate_manifest(): runs the manifest's archive_paths through
resolve_container_path against the consumer's volumes; returns the list of
unmappable entries so the caller can decide between "apply" and "skip".
apply_snapshot(): pre-flight via validate_manifest, then r1fs.get_file →
extract_archive → append history → write last_apply.json → retire prior
CID with cleanup_local_files=True (consumer-only — drops the local R1FS
download too). Failure modes are all non-fatal (no last_apply written so
the consumer-side app sees nothing landed; history not advanced).
_retire_previous_cid(): finds the most recent prior un-retired entry whose
cid differs from the latest, calls r1fs.delete_file, updates the prior
entry's deletion sub-record. Never raises — deletion failure must not
roll back the new publish/apply.
20 new tests (61 total): orchestrator happy paths, every failure stage
with the right artifact shape, ChainStore ack=False non-fatal,
two-snapshot retirement (sender + receiver sides), failed-retirement audit
trail, layout misalignment skip, end-to-end provider→consumer round-trip
through shared FakeR1FS + FakeChainStore. All pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lume
The mixin is the integration layer between SyncManager and the plugin
lifecycle. Public surface (called by ContainerAppRunnerPlugin in the next
step):
* _configure_system_volume(): provisions the always-on /r1en_system
fixed-size loopback (idempotent across restarts via fixed_volume.provision),
tracks it in self._fixed_volumes for cleanup parity, mkdir -p's the
volume-sync subdir, and binds it into self.volumes.
* _inject_sync_env_vars(): R1_SYSTEM_VOLUME / R1_VOLUME_SYNC_DIR /
R1_SYNC_REQUEST_FILE always set; R1_SYNC_TYPE / R1_SYNC_KEY only when
SYNC.ENABLED so apps that branch on role can.
* _sync_provider_tick / _sync_consumer_tick: throttled by POLL_INTERVAL.
Drive stop_container -> work -> start_container INLINE (not via a
StopReason -> _restart_container, which would unmount the loopback).
Validation failures don't disturb the container; execution failures
still restart it.
* _sync_initial_consumer_block(): blocks consumer's first start_container
until ChainStore has a record (bounded by INITIAL_SYNC_TIMEOUT;
0=forever).
* _recover_stale_processing(): renames orphan request.json.processing back
to request.json so a crash mid-publish doesn't leave a request stuck.
21 unit tests against a fake plugin that records stop/start ordering: env
injection across enabled/disabled, role helpers, throttle, full provider
flow with success and r1fs failure, validation failure does not stop
container, full consumer flow, misalignment skip, already-applied no-op,
stale .processing recovery (and don't-clobber rule when both files exist).
All pass alongside the 61 sync_manager tests (82 total).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Integration changes in container_app_runner.py + mixins/__init__.py:
* MRO: insert _SyncMixin before _ContainerUtilsMixin so its
_configure_system_volume / _inject_sync_env_vars / tick methods are
available to the plugin.
* _CONFIG: new SYNC block (ENABLED/KEY/TYPE/POLL_INTERVAL/
INITIAL_SYNC_TIMEOUT). System volume itself is hardcoded — no config.
* __reset_vars: initialize _last_sync_check + _sync_manager.
* on_init:
- _configure_system_volume() after the existing _configure_*_volumes
chain so /r1en_system is always provisioned;
- _recover_stale_processing() so a crash mid-publish doesn't strand
a request in .processing;
- _validate_sync_config() (logs a clear error and disables SYNC if
KEY/TYPE are bad — the system volume keeps working);
- _inject_sync_env_vars() right after _setup_env_and_ports() in the
non-semaphored branch.
* _restart_container: same _configure_system_volume + recovery +
_inject_sync_env_vars sequence so a full restart (e.g. for image
update) recreates the volume + env vars cleanly.
* _handle_initial_launch:
- _inject_sync_env_vars() after _setup_env_and_ports in the
semaphored branch;
- _sync_initial_consumer_block() before the very first
start_container so consumer pods boot on populated state.
* _perform_additional_checks: drives the provider/consumer tick INLINE
(return None — must NOT use a StopReason because that routes through
_restart_container which unmounts the loopback before our work).
All 82 sync-only unit tests still pass. The 10 pre-existing failures in
the rest of the test_*.py suite are unrelated (test env doesn't have
docker-py installed; same failure on master).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2E surfaced this on the first deploy: flask-files-explorer runs as a non-root user, the system volume's loop mount is root-owned (because _resolve_image_owner() returned (None, None) for this image), and the app's POST /create against /r1en_system/volume-sync/request.json got 'Permission denied'. The system volume is purpose-built as an app-writable control-plane channel between the container and CAR. There's no isolation gain in restricting the volume-sync subdir to root: the volume is per-CAR- instance, and the app already owns the container. Chmod both the mount root and the volume-sync/ subdir to 0o777 after mkdir so any container user can write requests. Existing 82 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… read
E2E surfaced two more permission issues:
1. _write_json_atomic uses tempfile.mkstemp which defaults to 0o600.
CAR runs as root, but the app inside the container is typically a
non-root user (e.g. flask-files-explorer's appuser). After
os.replace the response.json / last_apply.json / request.json.invalid
files were unreadable from inside the container. Now chmod 0o666
before replace.
2. extract_archive preserves the tar member's mode, but the new file
is root-owned (CAR's identity). If the source mode was something
restrictive, the app couldn't read it. Now we max() the source
mode against 0o644 for files and 0o755 for directories so the app
can always cat / traverse what CAR landed.
82 unit tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_volume E2E found this: after a CAR restart, the data volume (declared via FIXED_SIZE_VOLUMES) appeared empty inside the container even though fixed_volume.provision had just chowned and mounted it. Cause: _configure_system_volume was calling fixed_volume.cleanup_stale_mounts on the shared <plugin_data>/fixed_volumes/ root. That scan iterates EVERY meta/*.json file under the root, including the appdata.json that _FixedSizeVolumesMixin._configure_fixed_size_volumes() (which runs FIRST) just provisioned. cleanup_stale_mounts saw the active mount, treated it as 'stale', and umount + losetup -d'd it. Then provision() of the system volume ran but never re-mounted appdata. The container started with an empty bind from an unmounted host path. Fix: don't repeat the stale-mount sweep — the previous configure step already did it. Add an explicit comment so a future maintainer doesn't re-introduce the call thinking it's defensive. 82 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The decision the consumer needs is content-identity: 'is this bundle
the one I just applied?'. CID is a content-addressed hash that answers
that exactly. Version was a CAR-side timestamp that was indirectly
serving as identity, but it imported a class of failure modes (clock
skew, multi-provider non-monotonic ordering) that simply don't apply
to a CID comparison.
Two coupled changes:
* _sync_consumer_tick: skip when record.cid == latest_received().cid
instead of new_version <= last_version. Different cid → apply,
regardless of version metadata.
* _latest_in: sort by mtime, not by filename. Filenames stay
version-prefixed for chronological browsability under normal
operation, but 'what did I last apply?' is fundamentally an
insert-order question. Without this, a back-dated record would
write a history file with a lex-smaller filename than the previous
entry, latest_received would still return the older one, and the
consumer would re-apply the back-dated record on every tick
forever.
version is kept everywhere as informational metadata (record schema +
history entries + response.json + last_apply.json + filename prefixes)
so wire compat is preserved and human-readable logs still say
'applying v1714742400 ...'. Only the comparison logic changed.
Tests updated:
* test_skips_already_applied_version → test_skips_when_record_cid_matches_last_apply
* Added test_applies_when_cid_differs_even_if_version_lower (covers the
clock-skew failure mode the old code couldn't survive).
* test_latest_picks_highest_version → test_latest_picks_most_recently_written
83 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The volume-sync feature was scattered across two locations because the
codebase has a 'mixins go in mixins/' convention while the manager is
not a mixin:
before:
extensions/business/container_apps/
├── sync_manager.py ← module-level constants + SyncManager
└── mixins/
└── sync_mixin.py ← _SyncMixin (lifecycle integration)
After this refactor the entire feature lives in one folder and a reader
can scan it without jumping around:
extensions/business/container_apps/
└── sync/
├── __init__.py ← re-exports the public API
├── constants.py ← file names, hkeys, schema version,
│ stage labels — pure data
├── manager.py ← SyncManager + host-side path helpers
└── mixin.py ← _SyncMixin
Module renames:
- sync_manager.py → sync/manager.py
- mixins/sync_mixin.py → sync/mixin.py
- (new) → sync/constants.py (extracted from manager)
- (new) → sync/__init__.py (re-exports)
Import-site updates:
- container_app_runner.py: pull _SyncMixin from .sync, not .mixins
- mixins/__init__.py: drop _SyncMixin export (note added pointing
to the new location)
- tests/test_sync_manager.py and tests/test_sync_mixin.py: import
from .sync (the package) rather than the deleted modules
Tradeoff: this one feature breaks the 'mixins live in mixins/'
convention. Worth it because everything related to sync — constants,
helpers, the manager class, and the mixin — is now reachable with
'cd sync/'. The mixins/__init__.py docstring points the next reader at
the new location so it's not surprising.
Verification: all 83 unit tests pass (test_sync_manager + test_sync_mixin
exercise the resolver, atomic JSON, history I/O, claim_request, archive
roundtrip, orchestrators, env injection, ticks, recovery — all green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-boot block
The first-boot block solved a niche problem: an app that crashes on an
empty volume (e.g. model server with missing weights) would crash-loop
until the first snapshot landed, instead of cleanly waiting for state.
INITIAL_SYNC_TIMEOUT held the container off until either a snapshot
arrived or the timeout expired.
In practice this knob earned its keep in maybe 20% of use cases and was
mystifying noise in the other 80%:
- Default value (600s) made first-boot consumers look 'stuck' when
really nothing was wrong, just nobody had published yet.
- Showed up in provider configs too, where it's silently ignored —
confusing when reading a deployed pipeline JSON.
- The 20% case has an obvious app-side workaround: put a poll-and-
retry loop in the entrypoint, which Docker's restart policy already
enables for free.
New behavior on first boot of a fresh consumer: start the container
immediately on an empty /app/data, the next regular tick (POLL_INTERVAL
seconds later) finds the snapshot in ChainStore, applies it, and
restarts. One extra container restart cycle — no crash loop unless the
app explicitly requires state.
Removed:
- SYNC.INITIAL_SYNC_TIMEOUT field from _CONFIG.SYNC
- _SyncMixin._sync_initial_timeout() helper
- _SyncMixin._sync_initial_consumer_block() and its call site in
container_app_runner._handle_initial_launch
- 'INITIAL_SYNC_TIMEOUT' from the test_sync_mixin fake config
Apps that strictly require state at startup must implement their own
poll-and-retry in their entrypoint (documented in the mixin docstring).
All 83 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_retire_previous_cid was sorting history files by filename, which equals version order. When the consumer applies a lower-version CID after a higher-version one (clock-skewed provider, or a multi-provider sync set with non-monotonic timestamps), the retire path would treat the higher- version entry as latest and retire the freshly-applied CID instead. Mirror the contract that _latest_in already follows (manager.py:249-272): "most recently written" is an mtime question, not a version question. Adds test_retire_uses_mtime_not_version which constructs the exact non-monotonic-version-order scenario and asserts the just-applied entry is preserved while the older-by-mtime higher-version entry gets retired. Addresses codex review finding 1 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sync ticks (_sync_provider_tick, _sync_consumer_tick) stop+start the container inline to keep the system volume mounted, deliberately bypassing _restart_container. But by bypassing it they also skipped the per-restart state-machine maintenance: container_start_time, _app_ready, _health_probe_start, _tunnel_start_allowed, _commands_started, log-stream re-attach, and BUILD_AND_RUN_COMMANDS rerun. Result: after a sync slice, tunnels stayed marked ready, health checks were skipped, image-defined startup commands didn't rerun, and log capture was stale. Extract the post-start_container reset+hooks sequence into a new helper _reset_runtime_state_post_start on ContainerAppRunnerPlugin, called from both _restart_container (replacing the inline lines) and _SyncMixin's _sync_safe_start_container. Same contract in both places. Reset is in its own try/except in the sync path so a failed reset does not roll back a successful start — the next periodic tick can re-evaluate readiness. Adds test_consumer_resets_runtime_state_after_apply which seeds the plugin with "previous container running" markers, runs a consumer tick, and asserts both the lifecycle ordering (stop → start → reset) and the post-tick marker state. Existing lifecycle assertions widened from ["stop", "start"] to ["stop", "start", "reset"] across provider+consumer tests. Addresses codex review finding 2 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolve_container_path iterated self.volumes and took the first bind whose
prefix matched. Dict order is insertion order, not specificity. With nested
fixed-size mounts like /app and /app/data, the first-match-wins logic
silently resolved /app/data/foo to /app's host root — but Docker overlays
/app/data on top of /app inside the container, so reads/writes go to a
different host directory than what sync resolves to. Snapshots either
archive stale data or extract to the wrong location.
Switch to collect-then-longest: gather every mount whose bind covers the
input path, then pick the one with the longest bind. Matches Docker's
overlay semantics. Rule 3 / Rule 6 enforcement runs on the winner exactly
as before.
Three new tests:
- test_longest_prefix_wins_for_nested_mounts: broader bind inserted
first, deeper bind must still win
- test_longest_prefix_wins_regardless_of_insertion_order: symmetric case
with dict items swapped — result must be identical
- test_outer_bind_still_resolves_for_paths_only_it_covers: regression
guard that the broader mount continues to route paths that aren't
covered by the nested one
Addresses codex review finding 3 on PR #399.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The manifest produced by publish_snapshot stamps schema_version and archive_format (manager.py:514-520), but consumers only checked archive_paths mappability. A future provider that bumped MANIFEST_SCHEMA_VERSION or switched archive formats would be silently applied by current consumers — exactly the silent-mis-apply the schema version field exists to prevent. Extend validate_manifest to return a list of human-readable rejection reasons covering: - missing / non-int / unsupported schema_version - archive_format != ARCHIVE_FORMAT - archive_paths the consumer can't resolve Reasons are joined into a single log line in apply_snapshot, so the operator sees the full picture without grepping multiple messages. Existing validate_manifest tests updated to supply a valid schema_version + archive_format (matching what publish_snapshot actually emits); five new tests cover each rejection branch and a multi-violation case. Addresses codex review finding 4 on PR #399. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When fixed_volume._require_tools raises (host missing fallocate / mkfs /
losetup), _configure_system_volume returned early without disabling sync
or suppressing env-var injection. _inject_sync_env_vars then unconditionally
advertised R1_SYSTEM_VOLUME=/r1en_system to the container, so for
SYNC.ENABLED=True the app would write request.json to a path that doesn't
exist on the host while CAR polled a host root that was never provisioned
— a silent black hole.
Track unavailability with a _sync_unavailable flag set inside the early-
return branch; gate both _sync_enabled() and _inject_sync_env_vars() on
it. Provider/consumer ticks short-circuit via _sync_enabled(), and the
container sees no R1_* keys so apps that branch on those env vars can
fall back to non-sync behavior instead of writing into a phantom mount.
Two new tests:
- test_no_env_when_sync_unavailable: with _sync_unavailable=True, none
of the R1_* keys appear in self.env
- test_sync_disabled_when_unavailable: _sync_enabled() returns False
when the flag is set, even with SYNC.ENABLED=True in config
Addresses codex review finding 5 on PR #399.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rule 3 in resolve_container_path previously hard-rejected anything whose
host root didn't sit under <plugin_data>/fixed_volumes/mounts/. Legacy
VOLUMES live under CONTAINER_VOLUMES_PATH
(/edge_node/_local_cache/_data/container_volumes/) — per-instance host
dirs at a known location, functionally equivalent to fixed-size for sync
purposes, just without quota enforcement and image-owner UID resolution.
The single-root substring check kept them out for no structural reason.
Extend Rule 3 to a two-root allow-list: a mount qualifies if its host
root contains the fixed-size marker OR sits under
CONTAINER_VOLUMES_PATH. Anonymous Docker mounts and FILE_VOLUMES (whose
host paths fall outside both roots) continue to be rejected with a
clearer error naming both allowed roots. Rules 1/2/4/5/6 unchanged.
This unblocks existing legacy-VOLUMES pipelines without forcing operators
to migrate them, and as a free side-effect enables legacy ↔ fixed-size
sync: because resolve_container_path keys off container paths, a legacy
provider can publish a snapshot a fixed-size consumer applies cleanly
(and vice versa), giving a soft-migration path.
Tests:
- test_rejects_non_fixed_size renamed to test_rejects_anonymous_mount;
the fixture's /app/legacy bind (under tmpdir/tmpfs_legacy, outside
both allow-listed roots) now stands in for anonymous mounts.
- test_non_fixed_size_rejected renamed to test_anonymous_mount_rejected
(claim_request integration), same fixture semantics.
- test_legacy_volumes_resolves_to_host_root: new unit test on Rule 3
accepting a legacy-root-backed bind.
- test_round_trip_legacy_volumes_only: provider+consumer both legacy.
- test_round_trip_legacy_to_fixed_size + test_round_trip_fixed_size_to_legacy:
cross-type round-trips, proving the soft-migration property.
99 of 100 sync tests pass (the 1 pre-existing failure
test_applies_when_cid_differs_even_if_version_lower is unrelated and
predates this branch).
Implements docs/_todos/2026-05-12T16:00:00_extend_car_volume_sync_to_legacy_volumes.md
in the edge_node side. E2E scenarios 11 + 12 land in project_r1_edge_node
in a separate commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ence
Adds SYNC.HSYNC_POLL_INTERVAL (default 600s, min 300s) so consumers only
go to the network for a fresh chain replica on a slower cadence than the
overall sync tick. The cheap local-replica hget still runs on every
tick. Provider behavior unchanged.
Motivation: chainstore_hsync is a network round-trip to the chain cluster
with a 10s timeout. Previously it fired on every consumer tick (every
SYNC.POLL_INTERVAL, default 10s). When cluster peers are unreachable —
e.g. after a node restart while the chain peer mesh re-establishes —
each consumer tick burned ~10s waiting on a timing-out hsync. With this
change the hsync attempt fires at most every 10 min by default; the hget
fall-through keeps consumers reading their cached records during outages.
DEBUG/DEVELOPMENT ONLY — to be removed. The HSYNC_POLL_INTERVAL config
field is annotated in three places (SYNC config block, mixin helper
docstring, manager.fetch_latest docstring) as a temporary operator knob.
Once ChainStore propagation is reliable on devnet and prod, the cadence
should become a fixed internal default and the SYNC config field should
be deleted.
Implementation:
- new SYNC.HSYNC_POLL_INTERVAL config field (container_app_runner.py)
- _SyncMixin._hsync_poll_interval() with clamp/default, exposed to
SyncManager via the cfg_sync_hsync_poll_interval property (matches
the existing cfg_sync_key/cfg_sync_type pattern)
- SyncManager tracks _last_hsync (initial 0 so the first fetch_latest
still hsyncs as a bootstrap), updates it on every attempt regardless
of success so a timing-out hsync doesn't get retried on every tick
Tests:
- test_hsync_gated_by_interval_skips_within_window (manager)
- test_hsync_fires_again_after_interval_elapses (manager)
- test_hsync_failure_still_advances_timestamp (manager)
- test_hsync_poll_interval_default / _floor / _invalid_falls_back (mixin)
105 of 106 sync tests pass; the 1 pre-existing failure
(test_applies_when_cid_differs_even_if_version_lower) is unrelated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a single [sync] log line on each hsync attempt (success or failure)
with elapsed wall time. Previously only failures were logged; successful
hsyncs were silent, which made it impossible to confirm via logs that
the first hsync on plugin start actually fired.
With this:
- First post-init tick logs "[sync] chainstore_hsync ok (0.05s)"
(green) or "[sync] chainstore_hsync error after Xs: ..." (yellow).
- One log per HSYNC_POLL_INTERVAL window (default 10 min) so the
volume on prod logs stays small.
- Elapsed time exposes when peers are slow vs unreachable (a 9.8s
"ok" is healthy-but-slow; a 10.05s "error: timed out" is the same
issue we hit on devnet after the dvi-1/dvi-2 restart).
Test suite unchanged (105/106 — the 1 pre-existing failure is unrelated).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What changed: - extracted CAR runtime stop cleanup so sync can stop logs, exec threads, tunnels, semaphores, and the container without unmounting fixed volumes - updated sync provider/consumer ticks to use the shared runtime stop path - made sync history latest/retirement ordering stable across deletion updates Why: - sync publish/apply needs mounted system volumes to remain available while avoiding stale runtime sidecars across restarted containers
What changed: - added SyncRequest and SyncRuntimePolicy parsing with defaults and validation - allowed online provider requests to validate container path shape without requiring a host mount - carried runtime policy through provider records and manifests Why: - sync requests need an explicit runtime policy before provider and consumer runtime behavior can branch safely
What changed: - added Docker archive based provider capture for online sync requests - kept online provider requests from stopping the running container - covered non-mounted provider paths in manager and mixin tests Why: - requesters need to sync provider-side ephemeral paths that are not backed by CAR host volumes
What changed: - added consumer apply mode branching for offline restart, online no-restart, and online restart - defaulted old records to offline_restart behavior - covered consumer runtime modes in sync mixin tests Why: - requesters need to control whether consumers restart after applying synced data
The success-side response.json was missing the app-supplied metadata,
which made the in-volume-sync state incomplete: UIs that surface
response.json directly (because they can't bind-mount the host-side
sync_history/) could not show the tags the app attached to the snapshot.
Adds a "metadata" field mirroring the request's metadata so:
- the in-container view of the latest publish is self-contained
- the r1-car-console "History (sent)" tab shows metadata even without
R1_PLUGIN_DATA_DIR
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What changed: - Online provider capture no longer calls the restart path after publish. - Updated the provider tick regression test to assert no stop/start/reset occurs. Why: - provider_capture=online is intended to snapshot the live container without replacing it.
What changed: - Added realpath containment checks for consumer extraction targets. - Revalidate extraction destinations immediately before creating directories or files. - Added regressions for symlink directory/file escapes and failed apply state. Why: - Prevent archive extraction from writing outside the consumer volume through pre-existing symlinks.
What changed: - added a sync-local JsonControlFile helper for pending/processing mechanics - moved atomic app-readable JSON writes into a reusable helper - covered claim, recovery, cleanup, and write-failure behavior with unit tests Why: - standardize the file protocol before adding more CAR sync control files
What changed: - routed request claiming through JsonControlFile - reused the shared atomic JSON writer for sync artifacts - centralized request processing cleanup through the control-file helper Why: - keep SyncManager focused on sync validation and publish/apply behavior while sharing file mechanics
What changed: - delegated stale request processing recovery to JsonControlFile - reused the helper for provider pending-file detection - removed unused sync imports from the mixin Why: - keep lifecycle orchestration separate from low-level control-file mechanics
What changed: - read claimed JSON control files with no-follow handling - reject symlink or non-regular request processing files without raw-body leakage - added regressions for symlink request files in the helper and manager tests Why: - prevent app-writable control files from being used to read host files as CAR
What changed: - require local opt-in before provider online capture is accepted or archived - make consumer apply lifecycle mode come from local SYNC config - added regressions that provider records cannot force online consumer apply Why: - prevent app/provider metadata from forcing unsafe local lifecycle behavior
What changed: - validate manifest encryption against the published archive encryption value - constrain apply-time extraction to manifest-declared archive paths - added regressions for unsupported encryption and out-of-scope tar members Why: - prevent consumers from applying archives that do not match the advertised manifest contract
What changed: - changed consumer hsync defaults to a 60s cadence with a 10s floor - retry failed hsync attempts after a shorter failure window - clear _sync_unavailable when system volume provisioning later succeeds Why: - avoid long stale-consumer windows and recover sync availability after transient provision failures
What changed: - reject symlinked control directories and unsafe offline archive sources - strip special mode bits during consumer extraction - revalidate sync config on restart and run worker sync ticks before git checks - tighten manifest archive_paths validation and clean leaked CIDs on publish failure Why: - close PR399 security and lifecycle review blockers with regression coverage
What changed: - document SYNC config knobs and defaults - describe request, response, last_apply, and manifest contract - clarify local consumer apply policy and online provider capture opt-in Why: - make the PR399 app/operator contract reviewable from the repo docs
What changed: - keep the sync system volume root-owned and reject unsafe UTF-8 control files - require confirmed container stops before offline publish/apply and prevalidate consumer records - require positive ChainStore publish ack, restore files with target volume ownership, and keep consumer CID cleanup local - add regression coverage for the PR399 review blockers Why: - prevent unsafe control-plane ownership, live-volume mutation, unconfirmed publishes, and non-writable restored state
What changed: - document root-owned sync control-plane behavior - clarify offline stop/prevalidation, ChainStore ack, CID cleanup, and restored ownership semantics Why: - keep PR399 reviewers and CAR app authors aligned with the stricter sync behavior
What changed: - Treat successful Docker remove as a stopped container even if stop reported an error. - Mark runtime stop failures as degraded and skip fixed-volume cleanup when remove fails. - Cover the stop/remove edge cases in lifecycle tests. Why: - Sync and shutdown paths need to distinguish a removable container from one that may still be running before mutating volumes.
What changed: - Reject non-regular request control files before claim/read and open with no-follow/nonblocking flags. - Safely remove or quarantine unsafe processing entries without recursive deletion. - Make CAR-owned response/status files app-readable but not app-writable. - Let the claim path own provider request validation. Why: - App-writable control directories must not let special files block CAR or let apps rewrite CAR-owned status outputs.
What changed: - Treat ChainStore acknowledgement as the provider publish commit point. - Preserve app-visible success and clear processing files when sent-history append fails. - Skip prior CID retirement when the new history entry could not be written. - Preserve symlink-specific control-file validation diagnostics. Why: - A post-ack local history failure must not strand request.json.processing or make an already-published snapshot look unprocessed.
What changed: - Normalize consumer online apply modes to offline_restart while keeping config compatibility. - Record requested/effective apply mode and warning reason for operator visibility. - Update tests and README to reflect online consumer apply being disabled for now. Why: - Path-based extraction is not safe while the app can race filesystem paths, so consumer apply must stop the container until descriptor-safe extraction exists.
What changed: - Split consumer apply into pre-stop preparation and stopped-container commit. - Persist host-private apply state and quarantine failed CIDs with retry backoff. - Commit touched paths with backups/rollback and leave uncertain applies stopped. - Use durable applied state for dedupe while keeping app-visible last_apply.json informational. Why: - Consumers should not stop repeatedly for bad CIDs, and successful volume mutation needs a durable local state source independent of history/result-file writes.
What changed: - Track directory metadata and created parent directories as rollback operations during consumer apply commits. - Restore directory mode/ownership on failed apply and mark rollback as uncertain if metadata restoration fails. - Add regression coverage for directory metadata rollback and created parent cleanup. Why: - Failed consumer applies must not report restart_safe when directory metadata mutations were left behind.
…-sync # Conflicts: # ver.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds volume-sync to the Container App Runner — a provider/consumer mechanism that lets one CAR instance snapshot its app's persistent state and have other CAR instances apply that snapshot, enabling cross-node state replication for containerized apps. The feature is implemented as a self-contained
sync/subpackage and a_SyncMixinwoven intoContainerAppRunnerPlugin.Companion PR (UI side, deeploy-dapp): Ratio1/deeploy-dapp#96.
What it does
When an app deployed via CAR opts in, the runner orchestrates state replication around the container lifecycle:
Provider (the source-of-truth node)
/r1en_system/volume-sync/request.jsonlisting the container paths it wants snapshot-published.SYNC.POLL_INTERVAL, default 10s), claims the request atomically, stops the container, archives the requested paths (resolving container-absolute paths to host bind-mount paths viaself.volumes), uploads the bundle to R1FS, and publishes{cid, version, timestamp}to ChainStore underSYNC.KEY.volume-sync/response.jsonand the published metadata persisted under<plugin_data>/sync_history/sent/.Consumer (a node that wants to mirror provider state)
SYNC.KEY.volume-sync/last_apply.jsondescribing the applied state, restarts the container.Both sides record a full audit trail per instance under
<plugin_data>/sync_history/{sent,received}/<version>__<short_cid>.jsonso operators can inspect any past replication event.Always-on system volume
Independent of the SYNC role, every CAR instance now mounts a 10M ext4 loopback at
/r1en_system(provisioned via the sameFIXED_SIZE_VOLUMESmachinery as user-declared fixed-size volumes). The mount carries the control-plane files (volume-sync/request.json,response.json, etc.). Apps that don't use SYNC simply don't write into it; the volume costs ~10MB per instance and never sees data traffic.Env vars exported to the container:
R1_SYSTEM_VOLUME— alwaysR1_VOLUME_SYNC_DIR— alwaysR1_SYNC_REQUEST_FILE— always (so apps can write the request unconditionally; CAR just won't act on it withoutSYNC.ENABLED)R1_SYNC_TYPE,R1_SYNC_KEY— only when SYNC is enabled, for apps that want to branch on roleCAR config block
Added under
ContainerAppRunnerPlugin._CONFIG:Default-off, wire-compatible with existing pipelines.
Code layout
The whole feature lives in
extensions/business/container_apps/sync/:constants.py— file names, ChainStore hkey, manifest schema version, failure-stage labels. No code, just data —cat sync/constants.pygives the entire data-plane vocabulary in one place.manager.py—SyncManagerclass. Pure I/O orchestration: claim → archive → upload → publish (provider), fetch → validate → extract → record (consumer). Delegates network/storage toowner.r1fs/owner.chainstore_*so it stays testable. ~860 lines.mixin.py—_SyncMixin. Plugin-class integration: provisions the system volume inon_init, recovers any orphanedrequest.json.processingfrom a prior crash, drives provider/consumer ticks from_perform_periodic_monitoring. Both ticks callstop_container → manager_work → start_containerinline (must NOT route through_restart_container, which unmounts the system volume before the sync slice can read from it).__init__.py— re-exports the public surface so callers canfrom extensions.business.container_apps.sync import SyncManager.Tests:
extensions/business/container_apps/tests/test_sync_manager.py(866 lines) andtest_sync_mixin.py(390 lines) cover happy paths, every documented failure stage, and the crash-recovery flow.Design notes worth flagging for review
1. Identity is the CID, not the version. The consumer compares ChainStore's
cidagainst the last applied CID.versionis informational (timestamp-derived, kept for filename ordering and human-readable logs). Comparing CIDs eliminates a whole class of clock-skew failure modes — a provider with a wonky timestamp can never make a consumer permanently ignore a corrected snapshot — and makes multi-provider sync sets coherent without ordering assumptions. (Final commit75e48a7made this switch; earlier commits compared on version.)2. Sync slices bypass
_restart_container._restart_containercalls_cleanup_fixed_size_volumes, which would unmount the/r1en_systemloopback before the sync code can read or write into it. The mixin ticks inlinestop_container() → work → start_container()and never set aStopReasonthat would route through the restart path. This is documented at the top ofmixin.py.3. The system volume is always-on, even with SYNC disabled. The mount is cheap (10M ext4 loopback) and lets us add future CAR↔app control-plane features without re-architecting volume provisioning. The
volume-sync/subdirectory naming convention reserves space for those future features.4. INITIAL_SYNC_TIMEOUT was removed. Earlier iterations blocked the consumer's first boot until a snapshot arrived. Final design (
0882dd4) starts consumers immediately on an empty volume and lets the next tick apply whatever ChainStore has — apps that strictly require state at startup must implement their own poll-and-retry in their entrypoint. This matches how every other CAR feature degrades.5. Permissions:
/r1en_systemischmod 0o777. The system volume is per-CAR-instance and the app already owns the rest of its container; restricting the control-plane subdir to root would just mean every non-root app silently can't write the request file. Seebcf3193/4201532for context.Test plan
Unit tests cover the full SyncManager and
_SyncMixinsurfaces (over 1250 lines of tests across the two files). For integration:SYNC.ENABLED=False(or no SYNC block) → confirm/r1en_systemmounts and CAR runs to completion with no SYNC log linesSYNC.ENABLED=True, TYPE='provider', KEY=<uuid>→ write a request.json from inside the container, observe stop → publish → start log sequence, confirm record appears in ChainStore under the KEY and insync_history/sent/sync_history/received/reflects the apply.processingrename and republishes successfullyCommits
a5bc2bdskeleton →9cc09a4path resolver + atomic JSON + history I/O →cc2df3eclaim_request + archive helpers →2fe9ffeorchestrators →d9948b3mixin →ff9e73bplugin wiring →4201532+bcf3193permissions →01e7aa9cleanup-mount fix →75e48a7CID identity →4aed83fcollapse intosync/subpackage →0882dd4remove INITIAL_SYNC_TIMEOUT →bbe20a4version bump.