Workflow checkpoints are not restorable across SDK upgrades — TypeId uses Assembly.FullName (incl. version) for executor type matching
Labels: bug, workflows, checkpointing
Summary
Workflow checkpoint restore fails with:
System.IO.InvalidDataException: The specified checkpoint is not compatible with the workflow associated with this runner.
…whenever the Microsoft.Agents.AI.Workflows (or other executor/port type-owning) assembly version changes between the run that wrote the checkpoint and the run that restores it — e.g. after any package upgrade and redeploy. The workflow topology, executor IDs, and state shape are all unchanged; only the assembly version differs.
Because the agent SDK is in fast-moving preview (we've taken 1.3.0 → 1.6.1 → 1.6.2 → 1.8.0 → 1.9.0 over ~5 weeks), every upgrade silently invalidates all previously persisted checkpoints, breaking in-flight conversations/workflows that resume after a deploy.
Root cause
Checkpoint/workflow compatibility is gated by WorkflowInfo.IsMatch, which compares each executor's type via ExecutorInfo → TypeId. TypeId identity uses Assembly.FullName, which embeds Version, Culture, and PublicKeyToken:
// TypeId
public TypeId(Type type)
: this(type.Assembly.FullName, type.FullName) { } // AssemblyName = "...Version=1.8.0.0, Culture=..., PublicKeyToken=..."
public bool IsMatch(Type type)
{
if (AssemblyName == type.Assembly.FullName) // <-- version-sensitive comparison
return TypeName == type.FullName;
return false;
}
The runner serializes these TypeIds into the checkpoint and, on restore, re-derives them from the currently loaded assemblies:
// InProcessRunner.RestoreCheckpointCoreAsync
Checkpoint checkpoint = await CheckpointManager.LookupCheckpointAsync(SessionId, checkpointInfo);
if (!CheckWorkflowMatch(checkpoint)) // checkpoint.Workflow.IsMatch(Workflow)
{
throw new InvalidDataException(
"The specified checkpoint is not compatible with the workflow associated with this runner.");
}
So a checkpoint written under ...Version=1.8.0.0 can never match a runner whose executor/port types now resolve to ...Version=1.9.0.0, even though the types (namespace + name) and serialized state are identical.
Steps to reproduce
- Build a workflow whose executors are framework-provided (e.g. any agent bound via
AsAIAgent(...).WithCheckpointing(...)), run a turn, and persist a checkpoint via an ICheckpointStore/JsonCheckpointStore.
- Upgrade
Microsoft.Agents.AI.Workflows to any different version (patch/minor/major) — or otherwise change the assembly version.
- Reconstruct the same workflow and call
RestoreCheckpointAsync (or resume the workflow agent) with the previously stored CheckpointInfo.
Expected: Restore succeeds, since the workflow shape and state are unchanged.
Actual: InvalidDataException: The specified checkpoint is not compatible with the workflow associated with this runner.
Impact
- Any host that persists workflow checkpoints across process restarts/deploys (the intended durability use case) loses all existing checkpoints on every SDK bump.
- For interactive multi-turn agents, this surfaces as a hard, unrecoverable error on the first turn after a deploy — the conversation is effectively bricked unless the app detects the string and resets.
- The failure is opaque: it's a generic
InvalidDataException with a message string, with no indication that an assembly version mismatch (vs. a genuine topology change) caused it, and no machine-readable detail about which executor/type diverged.
Suggested fixes (in rough priority order)
- Don't include assembly version in type identity for matching. Match on
Type.FullName (namespace + type name), and optionally Assembly.GetName().Name (simple name) — not Assembly.FullName. This makes checkpoints portable across version-only changes while still distinguishing genuinely different types.
- Make type compatibility pluggable. Allow callers to supply an
ITypeCompatibilityResolver/comparer (or a TypeId matching policy: Exact vs NameOnly vs NameAndSimpleAssembly) so hosts can opt into version-tolerant restore.
- Add a checkpoint compatibility/version envelope with a documented forward/backward-compatibility contract, instead of relying on
Assembly.FullName equality as an implicit schema check.
- At minimum, fail better. Throw a typed, catchable exception (e.g.
WorkflowCheckpointMismatchException) that includes the specific diff (expected vs. actual TypeId/executor id), so hosts can distinguish "incompatible SDK version" from "topology actually changed" and react deterministically rather than string-matching the message.
Environment
Microsoft.Agents.AI.Workflows 1.9.0 (also observed on 1.6.x/1.8.x)
- Runtime: .NET 10
- Checkpoint store: custom
JsonCheckpointStore (Azure Blob), but the matching logic is store-agnostic
- OS/host: Linux containers (Azure Container Apps), one process per tenant; fails specifically on the first resume after a deploy that bumps the SDK
Workflow checkpoints are not restorable across SDK upgrades —
TypeIdusesAssembly.FullName(incl. version) for executor type matchingLabels: bug, workflows, checkpointing
Summary
Workflowcheckpoint restore fails with:…whenever the
Microsoft.Agents.AI.Workflows(or other executor/port type-owning) assembly version changes between the run that wrote the checkpoint and the run that restores it — e.g. after any package upgrade and redeploy. The workflow topology, executor IDs, and state shape are all unchanged; only the assembly version differs.Because the agent SDK is in fast-moving preview (we've taken
1.3.0 → 1.6.1 → 1.6.2 → 1.8.0 → 1.9.0over ~5 weeks), every upgrade silently invalidates all previously persisted checkpoints, breaking in-flight conversations/workflows that resume after a deploy.Root cause
Checkpoint/workflow compatibility is gated by
WorkflowInfo.IsMatch, which compares each executor's type viaExecutorInfo→TypeId.TypeIdidentity usesAssembly.FullName, which embedsVersion,Culture, andPublicKeyToken:The runner serializes these
TypeIds into the checkpoint and, on restore, re-derives them from the currently loaded assemblies:So a checkpoint written under
...Version=1.8.0.0can never match a runner whose executor/port types now resolve to...Version=1.9.0.0, even though the types (namespace + name) and serialized state are identical.Steps to reproduce
AsAIAgent(...).WithCheckpointing(...)), run a turn, and persist a checkpoint via anICheckpointStore/JsonCheckpointStore.Microsoft.Agents.AI.Workflowsto any different version (patch/minor/major) — or otherwise change the assembly version.RestoreCheckpointAsync(or resume the workflow agent) with the previously storedCheckpointInfo.Expected: Restore succeeds, since the workflow shape and state are unchanged.
Actual:
InvalidDataException: The specified checkpoint is not compatible with the workflow associated with this runner.Impact
InvalidDataExceptionwith a message string, with no indication that an assembly version mismatch (vs. a genuine topology change) caused it, and no machine-readable detail about which executor/type diverged.Suggested fixes (in rough priority order)
Type.FullName(namespace + type name), and optionallyAssembly.GetName().Name(simple name) — notAssembly.FullName. This makes checkpoints portable across version-only changes while still distinguishing genuinely different types.ITypeCompatibilityResolver/comparer (or aTypeIdmatching policy:ExactvsNameOnlyvsNameAndSimpleAssembly) so hosts can opt into version-tolerant restore.Assembly.FullNameequality as an implicit schema check.WorkflowCheckpointMismatchException) that includes the specific diff (expected vs. actualTypeId/executor id), so hosts can distinguish "incompatible SDK version" from "topology actually changed" and react deterministically rather than string-matching the message.Environment
Microsoft.Agents.AI.Workflows1.9.0 (also observed on 1.6.x/1.8.x)JsonCheckpointStore(Azure Blob), but the matching logic is store-agnostic