Add InvokeAsync for chained durable function calls (DOTNET-8661)#2374
Draft
GarrettBeatty wants to merge 12 commits into
Draft
Add InvokeAsync for chained durable function calls (DOTNET-8661)#2374GarrettBeatty wants to merge 12 commits into
GarrettBeatty wants to merge 12 commits into
Conversation
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
2e91110 to
a7868f3
Compare
464c591 to
d308c3b
Compare
a7868f3 to
797d920
Compare
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with serializer hook (retry deferred to follow-up PR) - ICheckpointSerializer interface - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - IRetryStrategy, ExponentialRetryStrategy, retry decision factories - DefaultJsonCheckpointSerializer - DurableLogger replay-suppression (currently returns NullLogger) - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2 remove update update update update
Match the Python / Java / JavaScript reference SDKs' replay-mode model: the workflow is "replaying" iff it has not yet revisited every checkpointed completed user-replayable operation. A single global flag flipped on the first fresh op (the prior model) misclassified workflow- body code that runs before the first step and would not generalize to Map/Parallel/Callback later. ExecutionState changes: - Replace `Mode`/`ExecutionMode`/`EnterExecutionMode()` with `IsReplaying` + `TrackReplay(operationId)`. - Initial replay decision: any non-EXECUTION op present means we're replaying. The service always sends an EXECUTION-type op carrying the input payload — that's bookkeeping, not user history, so it does not count toward replay (matches Python execution.py:258, Java ExecutionManager:81, JS execution-context.ts:62). - TrackReplay flips IsReplaying false once every checkpointed terminal- status non-EXECUTION op has been visited. Terminal set matches Python's: SUCCEEDED, FAILED, CANCELLED, STOPPED. Operation changes: - DurableOperation.ExecuteAsync calls TrackReplay(OperationId) at the top, so every operation participates in visit accounting without each subclass needing to remember. - StepOperation/WaitOperation drop their manual EnterExecutionMode calls. Tests: - ExecutionStateTests rewritten around IsReplaying/TrackReplay, including pinning regressions: only-EXECUTION-op ⇒ NotReplaying, all-visited ⇒ flips out of replay, PENDING ops do not block transition, idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Serializer DurableExecution now reads the registered ILambdaSerializer from the per-invocation ILambdaContext (added in the prior PR) for both step-result checkpointing and workflow input/output. AOT-safety is now determined entirely by which serializer the user registers with LambdaBootstrapBuilder.Create — there is no longer a forked path between reflection-based and AOT-safe APIs. Removed: - ICheckpointSerializer<T> + SerializationContext record - ReflectionJsonCheckpointSerializer<T> - The four JsonSerializerContext-taking overloads of DurableFunction.WrapAsync - The IDurableContext.StepAsync overload that took ICheckpointSerializer<T> - All [RequiresUnreferencedCode]/[RequiresDynamicCode] attributes and their related [UnconditionalSuppressMessage] shims Net result: 8 WrapAsync overloads → 4, 3 StepAsync overloads → 2, zero trim attributes in the public API. The AOT smoke test continues to publish with zero IL2026/IL3050 warnings.
- Wrap LambdaDurableServiceClient SDK calls in DurableExecutionException with
durable-execution context (which call, which ARN). User logs no longer show
bare AWSSDK stack traces. Update IsTerminalCheckpointError to unwrap the
inner AmazonServiceException for classification.
- Move public-API files out of Models/, Config/, Exceptions/ into the project
root so folder layout matches the Amazon.Lambda.DurableExecution namespace.
- Replace string action literals ("SUCCEED", "FAIL", "START") with the
Amazon.Lambda.OperationAction enum constants.
- Replace hand-rolled ToHex with Amazon.Util.AWSSDKUtils.ToHex. Drop the
netstandard2.0 SHA-256 fallback now that DurableExecution targets net8+.
- Spell "iff" as "if and only if" in ExecutionState replay-mode docs.
Tests updated for the new wrapping shape: terminal classification asserts on
DurableExecutionException with the inner SDK exception preserved; transient
and hydration paths assert ThrowsAsync<DurableExecutionException> with
InnerException set to the original AmazonServiceException.
Adds child-context support to the .NET Durable Execution SDK. A child context is a logical sub-workflow with its own deterministic operation-ID space, persisted as a CONTEXT operation so subsequent invocations replay the cached value without re-executing the function. Public surface: - IDurableContext.RunInChildContextAsync<T> (reflection + AOT-safe ICheckpointSerializer<T> overloads, plus a void overload). - ChildContextConfig with SubType (observability label) and ErrorMapping (transform exceptions before they surface to the caller). - ChildContextException for failure surfacing. Used as a building block for upcoming WaitForCallbackAsync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lays down shared types/constants for the upcoming durable-execution context operations (Callbacks, Invoke, Parallel, Map, WaitForCondition) and updates the design doc to match decisions reached after comparing against the Python, JS, and Java reference SDKs. SDK changes: - OperationSubTypes constants class (Step, Wait, Callback, WaitForCallback, Invoke, WaitForCondition, Parallel, ParallelBranch, Map, MapIteration). Replaces hard-coded SubType literals in StepOperation and WaitOperation. - OperationStatuses.TimedOut for callback/invoke timeout handling. Design-doc alignment: - Drop Serializer field from CallbackConfig, InvokeConfig, ChildContextConfig. Custom serializers flow through AOT-safe ICheckpointSerializer<T> overloads (matches the existing StepConfig pattern documented at line 1247). - InvokeConfig gains TenantId (matches Python/JS/Java); drops PayloadSerializer / ResultSerializer. - BatchItemStatus.Cancelled -> Started. The SDK does not synchronously cancel branches; the wire state of items still in flight when the batch resolves (e.g., FirstSuccessful short-circuit) is STARTED. Matches Python and JS. - IBatchResult<T> expanded to the full JS/Python surface: adds Started, GetErrors(), HasFailure, SuccessCount, FailureCount, StartedCount, TotalCount. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d308c3b to
be4c3ad
Compare
Adds chained-function invocation to the .NET Durable Execution SDK. InvokeAsync calls another durable function as a separate execution; the caller suspends until the chained function completes, with the result checkpointed for replay. Public surface: - IDurableContext.InvokeAsync<TPayload, TResult> (reflection + AOT-safe overloads with positional ICheckpointSerializer<TPayload> and ICheckpointSerializer<TResult>) - InvokeConfig with Timeout (currently [Obsolete] - reserved for a future ChainedInvokeOptions wire field) and TenantId for tenant propagation - Exception subclass tree: InvokeException base + InvokeFailedException, InvokeTimedOutException, InvokeStoppedException Internal: - InvokeOperation<TPayload, TResult> mirrors WaitOperation's sync-flush START + SuspendAndAwait pattern. Replay maps STARTED/PENDING to re-suspend; SUCCEEDED to cached deserialize; FAILED/TIMED_OUT/STOPPED to typed exception subclasses. - ExecutionState.IsTerminalStatus now includes TimedOut (was missing). - LambdaDurableServiceClient.MapFromSdkOperation now preserves ErrorData and StackTrace fields across all operation types (Step, ChildContext, ChainedInvoke). Pre-existing data-loss bug fix. - Reuses TerminationReason.InvokePending and OperationSubTypes.Invoke from Wave 0. Adds 38 unit tests + 4 integration tests covering happy path, failure propagation, tenant-ID propagation, and replay determinism. Extends DurableFunctionDeployment to support a downstream callee function with cross-Lambda IAM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
797d920 to
a659aa8
Compare
ad4d208 to
3acbed5
Compare
Base automatically changed from
gcbeatty/durable-wave0
to
gcbeatty/durable-child-context
May 20, 2026 17:46
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#2216
Summary
Adds chained-function invocation to the .NET Durable Execution SDK.
InvokeAsynccalls another durable function as a separate execution; the caller suspends until the chained function completes, with the result checkpointed for replay.Stacked on top of #2372 (Wave 0 cross-cutting types).
Fixes DOTNET-8661.
Public surface
IDurableContext.InvokeAsync<TPayload, TResult>— single overload. Payload and result are serialized using theILambdaSerializerregistered onILambdaContext.Serializer(typically configured viaLambdaBootstrapBuilder.Create(handler, serializer)). AOT and reflection-based scenarios share this single overload — the AOT story is determined by the registered serializer (e.g.,SourceGeneratorLambdaJsonSerializer<TContext>).InvokeConfigwithTimeout(currently[Obsolete]— reserved for a futureChainedInvokeOptionswire field) andTenantIdfor tenant propagationInvokeExceptionbase +InvokeFailedException,InvokeTimedOutException,InvokeStoppedExceptionInternal
InvokeOperation<TPayload, TResult>mirrorsWaitOperation's sync-flush START +SuspendAndAwaitpattern. Replay maps STARTED/PENDING to re-suspend; SUCCEEDED to cached deserialize; FAILED/TIMED_OUT/STOPPED to typed exception subclasses.ILambdaSerializerresolved fromILambdaContext.Serializer; throwsInvalidOperationExceptionwith a guidance message if no serializer is registered (mirrors the StepAsync/RunInChildContextAsync pattern).ExecutionState.IsTerminalStatusnow includes TimedOut (was missing).LambdaDurableServiceClient.MapFromSdkOperationnow preservesErrorDataandStackTracefields across all operation types (Step, ChildContext, ChainedInvoke). Pre-existing data-loss bug fix.TerminationReason.InvokePendingandOperationSubTypes.Invokefrom Wave 0.Test plan
DurableFunctionDeploymentto support a downstream callee function with cross-Lambda IAM.Generated with Claude Code