Add ParallelAsync for concurrent branch execution (DOTNET-8662)#2375
Draft
GarrettBeatty wants to merge 12 commits into
Draft
Add ParallelAsync for concurrent branch execution (DOTNET-8662)#2375GarrettBeatty wants to merge 12 commits into
GarrettBeatty wants to merge 12 commits into
Conversation
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
19c0128 to
fa13eef
Compare
464c591 to
d308c3b
Compare
fa13eef to
b7a06b4
Compare
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with serializer hook (retry deferred to follow-up PR) - ICheckpointSerializer interface - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - IRetryStrategy, ExponentialRetryStrategy, retry decision factories - DefaultJsonCheckpointSerializer - DurableLogger replay-suppression (currently returns NullLogger) - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2 remove update update update update
Match the Python / Java / JavaScript reference SDKs' replay-mode model: the workflow is "replaying" iff it has not yet revisited every checkpointed completed user-replayable operation. A single global flag flipped on the first fresh op (the prior model) misclassified workflow- body code that runs before the first step and would not generalize to Map/Parallel/Callback later. ExecutionState changes: - Replace `Mode`/`ExecutionMode`/`EnterExecutionMode()` with `IsReplaying` + `TrackReplay(operationId)`. - Initial replay decision: any non-EXECUTION op present means we're replaying. The service always sends an EXECUTION-type op carrying the input payload — that's bookkeeping, not user history, so it does not count toward replay (matches Python execution.py:258, Java ExecutionManager:81, JS execution-context.ts:62). - TrackReplay flips IsReplaying false once every checkpointed terminal- status non-EXECUTION op has been visited. Terminal set matches Python's: SUCCEEDED, FAILED, CANCELLED, STOPPED. Operation changes: - DurableOperation.ExecuteAsync calls TrackReplay(OperationId) at the top, so every operation participates in visit accounting without each subclass needing to remember. - StepOperation/WaitOperation drop their manual EnterExecutionMode calls. Tests: - ExecutionStateTests rewritten around IsReplaying/TrackReplay, including pinning regressions: only-EXECUTION-op ⇒ NotReplaying, all-visited ⇒ flips out of replay, PENDING ops do not block transition, idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Serializer DurableExecution now reads the registered ILambdaSerializer from the per-invocation ILambdaContext (added in the prior PR) for both step-result checkpointing and workflow input/output. AOT-safety is now determined entirely by which serializer the user registers with LambdaBootstrapBuilder.Create — there is no longer a forked path between reflection-based and AOT-safe APIs. Removed: - ICheckpointSerializer<T> + SerializationContext record - ReflectionJsonCheckpointSerializer<T> - The four JsonSerializerContext-taking overloads of DurableFunction.WrapAsync - The IDurableContext.StepAsync overload that took ICheckpointSerializer<T> - All [RequiresUnreferencedCode]/[RequiresDynamicCode] attributes and their related [UnconditionalSuppressMessage] shims Net result: 8 WrapAsync overloads → 4, 3 StepAsync overloads → 2, zero trim attributes in the public API. The AOT smoke test continues to publish with zero IL2026/IL3050 warnings.
- Wrap LambdaDurableServiceClient SDK calls in DurableExecutionException with
durable-execution context (which call, which ARN). User logs no longer show
bare AWSSDK stack traces. Update IsTerminalCheckpointError to unwrap the
inner AmazonServiceException for classification.
- Move public-API files out of Models/, Config/, Exceptions/ into the project
root so folder layout matches the Amazon.Lambda.DurableExecution namespace.
- Replace string action literals ("SUCCEED", "FAIL", "START") with the
Amazon.Lambda.OperationAction enum constants.
- Replace hand-rolled ToHex with Amazon.Util.AWSSDKUtils.ToHex. Drop the
netstandard2.0 SHA-256 fallback now that DurableExecution targets net8+.
- Spell "iff" as "if and only if" in ExecutionState replay-mode docs.
Tests updated for the new wrapping shape: terminal classification asserts on
DurableExecutionException with the inner SDK exception preserved; transient
and hydration paths assert ThrowsAsync<DurableExecutionException> with
InnerException set to the original AmazonServiceException.
Adds child-context support to the .NET Durable Execution SDK. A child context is a logical sub-workflow with its own deterministic operation-ID space, persisted as a CONTEXT operation so subsequent invocations replay the cached value without re-executing the function. Public surface: - IDurableContext.RunInChildContextAsync<T> (reflection + AOT-safe ICheckpointSerializer<T> overloads, plus a void overload). - ChildContextConfig with SubType (observability label) and ErrorMapping (transform exceptions before they surface to the caller). - ChildContextException for failure surfacing. Used as a building block for upcoming WaitForCallbackAsync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lays down shared types/constants for the upcoming durable-execution context operations (Callbacks, Invoke, Parallel, Map, WaitForCondition) and updates the design doc to match decisions reached after comparing against the Python, JS, and Java reference SDKs. SDK changes: - OperationSubTypes constants class (Step, Wait, Callback, WaitForCallback, Invoke, WaitForCondition, Parallel, ParallelBranch, Map, MapIteration). Replaces hard-coded SubType literals in StepOperation and WaitOperation. - OperationStatuses.TimedOut for callback/invoke timeout handling. Design-doc alignment: - Drop Serializer field from CallbackConfig, InvokeConfig, ChildContextConfig. Custom serializers flow through AOT-safe ICheckpointSerializer<T> overloads (matches the existing StepConfig pattern documented at line 1247). - InvokeConfig gains TenantId (matches Python/JS/Java); drops PayloadSerializer / ResultSerializer. - BatchItemStatus.Cancelled -> Started. The SDK does not synchronously cancel branches; the wire state of items still in flight when the batch resolves (e.g., FirstSuccessful short-circuit) is STARTED. Matches Python and JS. - IBatchResult<T> expanded to the full JS/Python surface: adds Started, GetErrors(), HasFailure, SuccessCount, FailureCount, StartedCount, TotalCount. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d308c3b to
be4c3ad
Compare
Adds parallel branch execution to the .NET Durable Execution SDK.
ParallelAsync runs N branches concurrently with configurable concurrency
limits and completion policies, returning an IBatchResult<T> with
per-branch status and error information.
Per-branch checkpoint payloads are serialized via the ILambdaSerializer
registered on ILambdaContext.Serializer (typically configured through
LambdaBootstrapBuilder.Create(handler, serializer)), matching the
StepAsync / RunInChildContextAsync pattern. There are no separate
reflection / AOT-safe overload pairs: the AOT story is determined
entirely by which serializer the user registers with the runtime.
Public surface:
- IDurableContext.ParallelAsync<T> (2 overloads: Func[] vs
DurableBranch<T>[])
- DurableBranch<T> record (Name + Func)
- ParallelConfig (MaxConcurrency, CompletionConfig, NestingType)
- CompletionConfig with factories AllSuccessful() / FirstSuccessful() /
AllCompleted(); ToleratedFailureCount / ToleratedFailurePercentage
(validated 0.0-1.0)
- IBatchResult<T> with All / Succeeded / Failed / Started accessors,
GetResults, GetErrors, ThrowIfError, HasFailure, CompletionReason,
count properties
- IBatchItem<T> with Index, Name, Status, Result, Error
- BatchItemStatus { Succeeded, Failed, Started }
- CompletionReason { AllCompleted, MinSuccessfulReached,
FailureToleranceExceeded }
- NestingType (Nested default; Flat throws NotSupportedException - reserved)
- ParallelException (carries IBatchResult; future-subclassable)
Internal:
- ParallelOperation<T> orchestrator dispatches branches with optional
semaphore-bounded concurrency. Each branch runs as a
ChildContextOperation<T> with deterministic ID via
OperationIdGenerator.CreateChild.
- Branch failures aggregated as IBatchItem<T> entries; orchestrator
throws ParallelException only when CompletionConfig signals
FailureToleranceExceeded.
- Parent CONTEXT checkpoint records summary (CompletionReason +
per-branch index/name/status); branch results live on per-branch
CONTEXT checkpoints.
- ExecutionState now thread-safe (lock around reads/writes of
_operations, _visitedOperations, _isReplaying). Required for
concurrent branch replay; affects all operations but no regressions.
- ParallelOperation awaits Task.WhenAll(inFlight) before disposing
the semaphore so cancellation/exception during dispatch lets
in-flight branches settle cleanly.
- Reuses OperationSubTypes.Parallel / OperationSubTypes.ParallelBranch
from Wave 0.
Adds 31 unit tests + 6 integration tests covering CompletionConfig
matrix, MaxConcurrency, FirstSuccessful short-circuit, replay
determinism, mixed-status replay, cancellation, and concurrency
stress.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b7a06b4 to
08b2095
Compare
ad4d208 to
3acbed5
Compare
Base automatically changed from
gcbeatty/durable-wave0
to
gcbeatty/durable-child-context
May 20, 2026 17:46
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#2216
Summary
Adds parallel branch execution to the .NET Durable Execution SDK.
ParallelAsyncruns N branches concurrently with configurable concurrency limits and completion policies, returning anIBatchResult<T>with per-branch status and error information.Per-branch checkpoint payloads are serialized via the
ILambdaSerializerregistered onILambdaContext.Serializer(typically configured throughLambdaBootstrapBuilder.Create(handler, serializer)), matching the StepAsync / RunInChildContextAsync pattern. There are no separate reflection / AOT-safe overload pairs: the AOT story is determined entirely by which serializer the user registers with the runtime (e.g.,SourceGeneratorLambdaJsonSerializer<TContext>for AOT scenarios).Stacked on top of #2372 (Wave 0 cross-cutting types).
Fixes DOTNET-8662.
The shared
IBatchResult<T>family added here will be reused by MapAsync (Wave 2).Public surface
IDurableContext.ParallelAsync<T>(2 overloads:Func[]vsDurableBranch<T>[])DurableBranch<T>record (Name + Func)ParallelConfig(MaxConcurrency, CompletionConfig, NestingType)CompletionConfigwith factoriesAllSuccessful()/FirstSuccessful()/AllCompleted();ToleratedFailureCount/ToleratedFailurePercentage(validated 0.0-1.0)IBatchResult<T>withAll/Succeeded/Failed/Startedaccessors,GetResults,GetErrors,ThrowIfError,HasFailure,CompletionReason, count propertiesIBatchItem<T>withIndex,Name,Status,Result,ErrorBatchItemStatus { Succeeded, Failed, Started }CompletionReason { AllCompleted, MinSuccessfulReached, FailureToleranceExceeded }NestingType(Nested default; Flat throwsNotSupportedException- reserved for a follow-up)ParallelException(carriesIBatchResult; future-subclassable)Internal
ParallelOperation<T>orchestrator dispatches branches with optional semaphore-bounded concurrency. Each branch runs as aChildContextOperation<T>with a deterministic ID viaOperationIdGenerator.CreateChild.IBatchItem<T>entries; orchestrator throwsParallelExceptiononly whenCompletionConfigsignalsFailureToleranceExceeded.ExecutionStatenow thread-safe (lock around reads/writes of_operations,_visitedOperations,_isReplaying). Required for concurrent branch replay; affects all operations but no regressions.ParallelOperationawaitsTask.WhenAll(inFlight)before disposing the semaphore so cancellation/exception during dispatch lets in-flight branches settle cleanly.OperationSubTypes.Parallel/OperationSubTypes.ParallelBranchfrom Wave 0.Test plan
Generated with Claude Code