feat(service-automation): persist suspended flow runs for durable resume across restarts#1520
Merged
Merged
Conversation
…ume across restarts (#1518) Suspended runs lived only in an in-memory Map, so a flow paused at an approval / wait / screen node could never resume after a process restart — blocking durable-pause flows on hibernating/serverless hosts. Back the in-memory map with a pluggable SuspendedRunStore (ADR-0019): - SuspendedRunStore interface + InMemorySuspendedRunStore (default, JSON round-trips) and ObjectStoreSuspendedRunStore, which persists to a new sys_automation_run system object via the ObjectQL engine. - AutomationServicePlugin registers the object and auto-enables the DB-backed store when an ObjectQL engine is present (opt out with suspendedRunStore: 'memory'). - Persist on suspend, delete on terminal completion; resume() rehydrates from the store on a cold boot and continues down the correct branch. - Idempotent resume: the suspension is consumed before downstream work, with an in-process guard against concurrent duplicate resumes. - Process-unique run ids so they don't collide with runs persisted by a previous process lifetime. Existing service-automation and plugin-approvals tests pass unchanged. https://claude.ai/code/session_01SGp45AQZzBp1ftWgDzTRqy
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1518.
Problem
service-automationkept suspended flow runs in memory only (engine.ts'sprivate suspendedRuns = new Map()). A flow paused at a long-lived node (approval,wait,screen, …) could not be resumed after the process restarted — the engine registered nosys_*objects, so run state was never persisted.This blocks durable-pause flows on serverless / hibernating hosts: on the Cloudflare Workers control plane a marketplace-review flow that suspends at an approval node (minutes → days) has its in-memory run evicted;
sys_approval_requestpersists, butresume(runId)then has nothing to continue, so post-approval side-effects never run.Solution (ADR-0019)
Make suspended-run state durable and rehydratable, behind a pluggable store:
SuspendedRunStoreinterface + two implementations:InMemorySuspendedRunStore— the default; JSON round-trips on save/load so it faithfully mirrors a DB serialization boundary.ObjectStoreSuspendedRunStore— persists to a newsys_automation_runsystem object via the ObjectQL engine (so it migrates like othersys_*tables and correlates tosys_approval_request.flow_run_id).AutomationServicePluginregisterssys_automation_runvia the manifest and auto-enables the DB-backed store when an ObjectQL engine is present (opt out withsuspendedRunStore: 'memory'). No engine ⇒ in-memory default stands.resume(runId)rehydrates from the store when the run isn't in the in-memory cache (cold boot) and continues from the paused node down the correct branch.resume, so a repeated resume after a partial restart can't double-run side effects.Acceptance criteria
approve/rejectbranch and runs downstream nodes.variablesround-trip correctly, including nested objects/arrays.Tests
engine.test.ts— suspend on one engine, resume on a brand-new engine sharing one store (simulated restart); nested-variable round-trip; idempotent duplicate resume;listSuspendedRunsDurablefallback.suspended-run-store.test.ts—ObjectStoreSuspendedRunStoreserialize/deserialize round-trip, upsert, delete/list, and an end-to-end suspend → restart → resume through the DB store.New exports
SuspendedRun,SuspendedRunStore,StepLogEntry,InMemorySuspendedRunStore,ObjectStoreSuspendedRunStore,SuspendedRunStoreEngine,SysAutomationRun, plusAutomationEngine.setSuspendedRunStore()andlistSuspendedRunsDurable().https://claude.ai/code/session_01SGp45AQZzBp1ftWgDzTRqy
Generated by Claude Code