diff --git a/.autover/changes/91693d62-b0c7-49b0-a74f-531aa1509864.json b/.autover/changes/91693d62-b0c7-49b0-a74f-531aa1509864.json new file mode 100644 index 000000000..41fab0859 --- /dev/null +++ b/.autover/changes/91693d62-b0c7-49b0-a74f-531aa1509864.json @@ -0,0 +1,11 @@ +{ + "Projects": [ + { + "Name": "Amazon.Lambda.DurableExecution", + "Type": "Patch", + "ChangelogMessages": [ + "Initial preview release of the Durable Execution SDK for .NET. Build long-running Lambda workflows with automatic checkpointing via `StepAsync`, `WaitAsync`, `RunInChildContextAsync`, `CreateCallbackAsync`, and `WaitForCallbackAsync` on `IDurableContext`." + ] + } + ] +} \ No newline at end of file diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/README.md b/Libraries/src/Amazon.Lambda.DurableExecution/README.md new file mode 100644 index 000000000..3117ea139 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/README.md @@ -0,0 +1,106 @@ +# AWS Lambda Durable Execution SDK for .NET + +> **Preview.** `Amazon.Lambda.DurableExecution` is in active development (0.x). Public APIs may change before 1.0. + +`Amazon.Lambda.DurableExecution` is the .NET SDK for building resilient, long-running AWS Lambda functions that automatically checkpoint progress and resume after failures. Workflows can run for up to one year, with charges only for active compute time. + +## Key Features +gv +- **Automatic checkpointing** — progress is saved after each step; failures resume from the last checkpoint. +- **Cost-effective waits** — suspend execution for minutes, hours, or days without compute charges. +- **Configurable retries** — built-in retry strategies with exponential backoff and jitter. +- **Replay safety** — functions deterministically resume from checkpoints after interruptions. +- **Type safety** — full generic type support for step results. +- **AOT-friendly** — pluggable `ILambdaSerializer` so you can register `SourceGeneratorLambdaJsonSerializer` for trimmed / Native AOT functions. + +## How It Works + +Your handler delegates to `DurableFunction.WrapAsync`, which gives your workflow function an `IDurableContext`. The context is your interface to durable operations: + +- `ctx.StepAsync` — run code and checkpoint the result. ([docs](docs/core/steps.md)) +- `ctx.WaitAsync` — suspend execution without compute charges. ([docs](docs/core/wait.md)) +- `ctx.CreateCallbackAsync` / `ctx.WaitForCallbackAsync` — wait for external events (approvals, webhooks). ([docs](docs/core/callbacks.md)) +- `ctx.RunInChildContextAsync` — run an isolated child context with its own checkpoint log. ([docs](docs/core/child-contexts.md)) + +## Quick Start + +### Installation + +```bash +dotnet add package Amazon.Lambda.DurableExecution +``` + +### Your first durable function + +> **Programming model:** the preview only supports the **executable programming model** — your function is an executable assembly that hosts its own bootstrap loop and passes the serializer to the runtime in code. Class-library handlers on the managed runtime will be supported once the changes made to Amazon.Lambda.RuntimeSupport to support durable functions has been deployed to the managed runtime. This README will be updated then. + +A complete order-processing workflow with two steps and a wait, deployed as an executable assembly on the `dotnet10` runtime. `Main` builds a `LambdaBootstrap` with your handler and an `ILambdaSerializer`, and `DurableFunction.WrapAsync` uses that serializer to checkpoint step inputs and outputs. + +```csharp +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace OrderProcessor; + +public class OrderProcessor +{ + public static async Task Main() + { + var handler = new OrderProcessor(); + var serializer = new DefaultLambdaJsonSerializer(); + using var wrapper = HandlerWrapper.GetHandlerWrapper( + handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(wrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(Order order, IDurableContext ctx) + { + var reservation = await ctx.StepAsync( + async _ => await InventoryService.ReserveAsync(order.Items), + name: "reserve-inventory"); + + var payment = await ctx.StepAsync( + async _ => await PaymentService.ChargeAsync(order.PaymentMethod, order.Total), + name: "process-payment"); + + await ctx.WaitAsync(TimeSpan.FromHours(2), name: "warehouse-processing"); + + var shipment = await ctx.StepAsync( + async _ => await ShippingService.ShipAsync(reservation, order.Address), + name: "confirm-shipment"); + + return new OrderResult(order.Id, shipment.TrackingNumber); + } +} + +public record Order(string Id, IReadOnlyList Items, PaymentMethod PaymentMethod, decimal Total, Address Address); +public record OrderResult(string OrderId, string TrackingNumber); +``` + +For AOT or trim-friendly serialization, swap `DefaultLambdaJsonSerializer` for `SourceGeneratorLambdaJsonSerializer` and register your `JsonSerializerContext`. + +## Documentation + +**Core operations** + +- [Steps](docs/core/steps.md) — execute code with automatic checkpointing, retry strategies, and at-least/at-most-once semantics. +- [Wait](docs/core/wait.md) — pause execution without compute charges. +- [Callbacks](docs/core/callbacks.md) — wait for external systems to respond. +- [Child Contexts](docs/core/child-contexts.md) — group related operations into isolated, checkpointed units. + +**Examples** + +End-to-end test functions (each paired with an integration test) live under `Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/`. + +## Related SDKs + +- [aws-durable-execution-sdk-java](https://github.com/aws/aws-durable-execution-sdk-java) — Java SDK +- [aws-durable-execution-sdk-js](https://github.com/aws/aws-durable-execution-sdk-js) — JavaScript / TypeScript SDK +- [aws-durable-execution-sdk-python](https://github.com/aws/aws-durable-execution-sdk-python) — Python SDK diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/callbacks.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/callbacks.md new file mode 100644 index 000000000..573ad17e3 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/callbacks.md @@ -0,0 +1,185 @@ +# Callbacks + +Callbacks let a workflow suspend until an external system (a human approver, a webhook, another service) delivers a result. The external system completes the callback by calling `SendDurableExecutionCallbackSuccess`, `SendDurableExecutionCallbackFailure`, or `SendDurableExecutionCallbackHeartbeat` with the `callbackId` you handed it. + +Two APIs are available: + +- `WaitForCallbackAsync` — composite operation; create the callback, hand it to the external system inside a submitter delegate, and suspend until the result arrives. +- `CreateCallbackAsync` — lower-level; allocate the callback yourself, hand the ID out in your own steps, and `await` the result separately. + +## `WaitForCallbackAsync` + +```csharp +Task WaitForCallbackAsync( + Func submitter, + string? name = null, + WaitForCallbackConfig? config = null, + CancellationToken cancellationToken = default); +``` + +The submitter receives the freshly allocated `callbackId` and an `IWaitForCallbackContext` (logger-only). Submitter failures (after retries are exhausted) surface as `CallbackSubmitterException`; callback failures and timeouts surface as `CallbackFailedException` / `CallbackTimeoutException`. + +## `CreateCallbackAsync` + +```csharp +Task> CreateCallbackAsync( + string? name = null, + CallbackConfig? config = null, + CancellationToken cancellationToken = default); +``` + +The returned `ICallback` exposes: + +- `string CallbackId` — give this to the external system. +- `Task GetResultAsync(CancellationToken)` — `await` to suspend until the external system completes the callback. + +The result is deserialized using the registered `ILambdaSerializer`. Throws `CallbackFailedException` or `CallbackTimeoutException` on failure. + +## End-to-end example + +Two Lambdas: a workflow that suspends on a callback, and a separate approver Lambda that resolves it. The workflow hands its `callbackId` to the approver via `Event` invocation (fire-and-forget), then suspends. The approver runs in its own Lambda and signals completion by calling `SendDurableExecutionCallbackSuccessAsync`. + +### 1. Workflow Lambda — `WaitForCallbackAsync` + +```csharp +using Amazon.Lambda; +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.Model; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace OrderApprovalWorkflow; + +public class Function +{ + private static readonly IAmazonLambda LambdaClient = new AmazonLambdaClient(); + + public static async Task Main() + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var wrapper = HandlerWrapper.GetHandlerWrapper( + handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(wrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(OrderInput input, IDurableContext ctx) + { + var approverFunctionName = Environment.GetEnvironmentVariable("APPROVER_FUNCTION_NAME") + ?? throw new InvalidOperationException("APPROVER_FUNCTION_NAME env var not set"); + + // Suspend until the approver Lambda calls SendDurableExecutionCallbackSuccessAsync + // with this callback ID. The submitter is invoked once with a freshly-allocated + // ID; it hands the ID to the approver and returns immediately. + var result = await ctx.WaitForCallbackAsync( + submitter: async (callbackId, cbCtx) => + { + var payload = $$"""{"callbackId":"{{callbackId}}","orderId":"{{input.OrderId}}"}"""; + await LambdaClient.InvokeAsync(new InvokeRequest + { + FunctionName = approverFunctionName, + InvocationType = InvocationType.Event, // fire-and-forget + Payload = payload + }); + }, + name: "approve"); + + return result; + } +} + +public record OrderInput(string OrderId); +public record ApprovalResult(string Status, string ApprovedBy); +``` + +### 2. Approver Lambda — completes the callback + +A plain Lambda — no durable execution wrapper. It receives the callback ID, performs whatever logic the external system needs, and calls `SendDurableExecutionCallbackSuccessAsync` to resume the workflow. + +```csharp +using System.Text; +using Amazon.Lambda; +using Amazon.Lambda.Core; +using Amazon.Lambda.Model; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace OrderApprovalWorkflow; + +public class ApproverFunction +{ + private static readonly IAmazonLambda LambdaClient = new AmazonLambdaClient(); + + public static async Task Main() + { + var handler = new ApproverFunction(); + var serializer = new DefaultLambdaJsonSerializer(); + using var wrapper = HandlerWrapper.GetHandlerWrapper( + handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(wrapper); + await bootstrap.RunAsync(); + } + + public async Task Handler(ApproverInput input, ILambdaContext context) + { + // The result JSON must match the T in WaitForCallbackAsync — here, ApprovalResult. + var resultJson = $$"""{"Status":"approved","ApprovedBy":"{{input.OrderId}}"}"""; + await LambdaClient.SendDurableExecutionCallbackSuccessAsync( + new SendDurableExecutionCallbackSuccessRequest + { + CallbackId = input.CallbackId, + Result = new MemoryStream(Encoding.UTF8.GetBytes(resultJson)) + }); + return null; + } +} + +public record ApproverInput(string CallbackId, string OrderId); +``` + +To signal failure instead, call `SendDurableExecutionCallbackFailureAsync` — the workflow throws `CallbackFailedException`. To extend the heartbeat deadline (when `HeartbeatTimeout` is configured), call `SendDurableExecutionCallbackHeartbeatAsync`. + +### `CreateCallbackAsync` variant + +When you need to allocate the ID before deciding how to hand it out — e.g. several steps run between callback creation and submission — use `CreateCallbackAsync` and a separate `StepAsync` for the submission. Wrapping the hand-off in a step prevents replays from re-invoking the approver. + +```csharp +private async Task Workflow(OrderInput input, IDurableContext ctx) +{ + var cb = await ctx.CreateCallbackAsync(name: "approve"); + + await ctx.StepAsync(async _ => + { + var payload = $$"""{"callbackId":"{{cb.CallbackId}}","orderId":"{{input.OrderId}}"}"""; + await LambdaClient.InvokeAsync(new InvokeRequest + { + FunctionName = approverFunctionName, + InvocationType = InvocationType.Event, + Payload = payload + }); + }, name: "submit"); + + return await cb.GetResultAsync(); +} +``` + +## Configuration + +```csharp +public class CallbackConfig +{ + public TimeSpan Timeout { get; set; } // overall callback timeout, ≥ 1s or Zero (default = no timeout) + public TimeSpan HeartbeatTimeout { get; set; } // heartbeat-gap timeout, ≥ 1s or Zero (default = no timeout) +} + +public class WaitForCallbackConfig : CallbackConfig +{ + public IRetryStrategy? RetryStrategy { get; set; } // applied to the submitter step only +} +``` diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/child-contexts.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/child-contexts.md new file mode 100644 index 000000000..4a664e11e --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/child-contexts.md @@ -0,0 +1,46 @@ +# Child Contexts + +`RunInChildContextAsync` runs a sub-workflow inside its own deterministic operation-ID space. The child's return value is checkpointed as a single `CONTEXT` operation, so subsequent invocations replay the cached value without re-executing the contained operations. Use to group related steps under a shared error/observability boundary. + +## Signatures + +```csharp +Task RunInChildContextAsync( + Func> func, + string? name = null, + ChildContextConfig? config = null, + CancellationToken cancellationToken = default); + +Task RunInChildContextAsync( + Func func, + string? name = null, + ChildContextConfig? config = null, + CancellationToken cancellationToken = default); +``` + +## Example + +```csharp +var phaseResult = await ctx.RunInChildContextAsync( + async childCtx => + { + var validated = await childCtx.StepAsync(async _ => Validate(input), name: "validate"); + await childCtx.WaitAsync(TimeSpan.FromSeconds(2), name: "short_wait"); + var processed = await childCtx.StepAsync(async _ => Process(validated), name: "process"); + return processed; + }, + name: "phase", + config: new ChildContextConfig { SubType = "OrderProcessing" }); +``` + +## Configuration + +```csharp +public sealed class ChildContextConfig +{ + public string? SubType { get; set; } // observability label + public Func? ErrorMapping { get; set; } // remap thrown exceptions +} +``` + +`ErrorMapping` lets you translate exceptions thrown inside the child context into a domain-specific exception type before they propagate to the parent. diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/steps.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/steps.md new file mode 100644 index 000000000..c7f9e9f22 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/steps.md @@ -0,0 +1,148 @@ +# Steps + +`StepAsync` runs a unit of work whose result is checkpointed. On replay, completed steps return their cached result without re-executing. + +## Signatures + +```csharp +Task StepAsync( + Func> func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + +Task StepAsync( + Func func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); +``` + +The `IStepContext` parameter exposes the current `AttemptNumber`, the deterministic `OperationId`, and a scoped `Logger`. Returned values are serialized via the `ILambdaSerializer` registered on `ILambdaContext.Serializer`. + +## Basic step + +```csharp +var user = await ctx.StepAsync( + async _ => await userService.GetUserAsync(userId), + name: "fetch-user"); +``` + +## Multiple steps + +```csharp +var a = await ctx.StepAsync(async _ => $"a-{input.OrderId}", name: "step_1"); +var b = await ctx.StepAsync(async _ => $"{a}-b", name: "step_2"); +var c = await ctx.StepAsync(async _ => $"{b}-c", name: "step_3"); +``` + +## Step configuration + +Configure step behavior with `StepConfig`: + +```csharp +public sealed class StepConfig +{ + public IRetryStrategy? RetryStrategy { get; set; } // null = no retry + public StepSemantics Semantics { get; set; } = StepSemantics.AtLeastOncePerRetry; +} +``` + +### Retry strategies + +When a step throws, the configured `IRetryStrategy` decides whether to retry and after what delay. + +```csharp +public interface IRetryStrategy +{ + RetryDecision ShouldRetry(Exception exception, int attemptNumber); +} + +public readonly struct RetryDecision +{ + public bool ShouldRetry { get; } + public TimeSpan Delay { get; } + + public static RetryDecision DoNotRetry(); + public static RetryDecision RetryAfter(TimeSpan delay); +} +``` + +Built-in strategies on the `RetryStrategy` static class: + +| Member | Behavior | +| --- | --- | +| `RetryStrategy.Default` | 6 attempts, 2× backoff, 5s initial, 60s max, Full jitter. | +| `RetryStrategy.Transient` | 3 attempts, 2× backoff, 1s initial, 5s max, Half jitter. | +| `RetryStrategy.None` | 1 attempt only — no retry. | +| `RetryStrategy.Exponential(...)` | Builder for custom exponential strategies. | +| `RetryStrategy.FromDelegate(Func)` | Wrap a custom decision function. | + +`Exponential` parameters: + +```csharp +public static IRetryStrategy Exponential( + int maxAttempts = 3, + TimeSpan? initialDelay = null, // default 5s + TimeSpan? maxDelay = null, // default 300s + double backoffRate = 2.0, + JitterStrategy jitter = JitterStrategy.Full, + Type[]? retryableExceptions = null, + string[]? retryableMessagePatterns = null); + +public enum JitterStrategy { None, Full, Half } +``` + +When `retryableExceptions` and `retryableMessagePatterns` are both null (default), every exception is retried up to `maxAttempts`. If either is set, only matching exceptions are retried. + +#### Step with retries + +```csharp +var result = await ctx.StepAsync( + async stepCtx => + { + if (stepCtx.AttemptNumber < 3) + throw new InvalidOperationException($"flake on attempt {stepCtx.AttemptNumber}"); + return $"ok on attempt {stepCtx.AttemptNumber}"; + }, + name: "flaky_step", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential( + maxAttempts: 3, + initialDelay: TimeSpan.FromSeconds(2), + maxDelay: TimeSpan.FromSeconds(10), + backoffRate: 2.0, + jitter: JitterStrategy.None) + }); +``` + +### Step semantics + +Control how a step behaves when interrupted mid-execution: + +```csharp +public enum StepSemantics +{ + AtLeastOncePerRetry, // default — body may re-execute if Lambda is re-invoked mid-attempt + AtMostOncePerRetry // body executes at most once per retry attempt +} +``` + +| Semantic | Behavior | Use case | +| --- | --- | --- | +| `AtLeastOncePerRetry` (default) | Re-executes the step if interrupted before completion. | Idempotent operations (database upserts, API calls with idempotency keys). | +| `AtMostOncePerRetry` | Never re-executes; throws if interrupted. | Non-idempotent operations (sending email, charging payments). | + +These semantics apply *per retry attempt*, not per overall execution. To achieve true at-most-once across the whole workflow, combine with `RetryStrategy.None`: + +```csharp +var result = await ctx.StepAsync( + async _ => await paymentService.ChargeAsync(amount), + name: "charge-payment", + config: new StepConfig + { + Semantics = StepSemantics.AtMostOncePerRetry, + RetryStrategy = RetryStrategy.None + }); +``` diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/wait.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/wait.md new file mode 100644 index 000000000..d7d2679f4 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/wait.md @@ -0,0 +1,28 @@ +# Wait + +`WaitAsync` suspends the workflow for a duration. The Lambda terminates and is re-invoked when the timer fires — you pay for compute time only on the resume side. + +## Signature + +```csharp +Task WaitAsync( + TimeSpan duration, + string? name = null, + CancellationToken cancellationToken = default); +``` + +`duration` must be at least 1 second and at most 31,622,400 seconds (~1 year). + +## Example + +```csharp +await ctx.WaitAsync(TimeSpan.FromHours(2), name: "warehouse-processing"); +``` + +## Step + Wait + Step + +```csharp +var validated = await ctx.StepAsync(async _ => Validate(input), name: "validate"); +await ctx.WaitAsync(TimeSpan.FromSeconds(3), name: "short_wait"); +var processed = await ctx.StepAsync(async _ => Process(validated), name: "process"); +``` diff --git a/README.md b/README.md index 405e952a5..afd2c11e3 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ For a history of releases view the [release change log](CHANGELOG.md) - [Amazon.Lambda.Annotations](#amazonlambdaannotations) - [Amazon.Lambda.AspNetCoreServer](#amazonlambdaaspnetcoreserver) - [Amazon.Lambda.TestUtilities](#amazonlambdatestutilities) + - [Amazon.Lambda.DurableExecution](#amazonlambdadurableexecution) - [Blueprints](#blueprints) - [Dotnet CLI Templates](#dotnet-cli-templates) - [Yeoman (Deprecated)](#yeoman-deprecated) @@ -113,6 +114,11 @@ For more information see the [README.md](Libraries/src/Amazon.Lambda.AspNetCoreS Package includes test implementation of the interfaces from Amazon.Lambda.Core and helper methods to help in locally testing. For more information see the [README.md](Libraries/src/Amazon.Lambda.TestUtilities/README.md) file for Amazon.Lambda.TestUtilities. +### Amazon.Lambda.DurableExecution + +The Durable Execution SDK lets you write multi-step Lambda workflows that automatically checkpoint progress and resume after failures. +For more information see the [README.md](Libraries/src/Amazon.Lambda.DurableExecution/README.md) file for Amazon.Lambda.DurableExecution. + ## Blueprints Blueprints in this repository are .NET Core Lambda functions that can used to get started. In Visual Studio the Blueprints are available when creating a new project and selecting the AWS Lambda Project.