Skip to content

feat(saga-pattern): Add Lambda durable functions saga pattern impleme…#3014

Open
tafaman wants to merge 1 commit intoaws-samples:mainfrom
tafaman:saga-pattern-durable-functions
Open

feat(saga-pattern): Add Lambda durable functions saga pattern impleme…#3014
tafaman wants to merge 1 commit intoaws-samples:mainfrom
tafaman:saga-pattern-durable-functions

Conversation

@tafaman
Copy link
Copy Markdown

@tafaman tafaman commented Apr 8, 2026

SAGA Pattern using AWS Lambda durable functions

This pattern demonstrates AWS Lambda durable functions with saga orchestrator for distributed transaction coordination using Python 3.13, DynamoDB, and CDK. Key features include:

  • Saga orchestrator durable function for coordinating distributed transactions
  • Service functions for flight, hotel, and car reservation/cancellation
  • DynamoDB tables for storing reservation state across services
  • Automatic rollback with compensation logic on failures
  • CDK TypeScript infrastructure as code for automated deployment
  • aws-durable-execution-sdk integration for checkpoint execution

The pattern showcases how to implement a travel booking system where Lambda functions coordinate multiple service reservations and automatically compensate (rollback) completed steps when any downstream service fails.

…ntation

- Add saga orchestrator durable function for distributed transaction coordination
- Add service functions for flight, hotel, and car reservation and cancellation
- Add DynamoDB tables for storing reservation state across services
- Add CDK infrastructure as code for automated deployment
- Add comprehensive README with architecture diagrams and deployment instructions
- Add test scenarios for success and failure paths with compensation logic
- Add Python Lambda functions with aws-durable-execution-sdk integration
- Implements saga pattern for travel booking system with automatic rollback on failures
@bfreiberg
Copy link
Copy Markdown
Contributor

1. Security Analysis

Finding 1.1: Saga Orchestrator Has ReadWriteData on All Three Tables

  • Severity: MEDIUM

  • File: saga-pattern-cdk/lib/saga-pattern-cdk-stack.ts (lines 196-198)

  • Issue: The saga orchestrator durable function is granted grantReadWriteData on all three DynamoDB tables (hotel, flight, car), but the orchestrator never directly reads or writes to these tables — it delegates all DynamoDB operations to the individual service Lambda functions via context.invoke(). These permissions are unnecessary and violate least-privilege.

  • Current Code/Configuration:

    hotelTable.grantReadWriteData(sagaDurableFunction);
    flightTable.grantReadWriteData(sagaDurableFunction);
    carTable.grantReadWriteData(sagaDurableFunction);
  • Recommended Fix:

    // Remove these three lines — the orchestrator only invokes service functions,
    // it does not access DynamoDB directly. The service functions already have
    // their own table permissions.

2. Schema Validation

Finding 2.1: Missing Pattern Metadata JSON File

  • Severity: HIGH
  • File: example-pattern.json (missing file)
  • Issue: No pattern metadata JSON file exists in the pattern root directory. Without this file, the pattern cannot be listed on serverlessland.com.

3. Build Artifacts & Cleanup

Finding 3.1: saga-layer.zip Binary Blob Without Build Instructions

  • Severity: MEDIUM
  • File: saga-pattern-cdk/lambda/saga-workflow/saga-layer.zip
  • Issue: A pre-built ZIP archive is committed to the repository without any instructions on how to reproduce it. Consumers cannot verify the contents, update dependencies, or rebuild the layer for a different architecture. In general, do not add compressed files to the repo. Instead add instructions/scripts on how to create the necessary artifact.

4. README Documentation

Finding 4.1: README References Nonexistent Files

  • Severity: MEDIUM

  • File: README.md (lines 220-221)

  • Issue: The "Additional Test Files" section references saga-pattern-cdk/lambda/test-events.json and saga-pattern-cdk/TESTING.md, but neither file exists in the repository.

  • Current Code/Configuration:

    See `saga-pattern-cdk/lambda/test-events.json` for more test scenarios and `saga-pattern-cdk/TESTING.md` for comprehensive testing documentation.
  • Recommended Fix: Remove this section or create the referenced files.

Finding 4.2: Typo in Stack Description

  • Severity: LOW
  • File: saga-pattern-cdk/bin/saga-pattern-cdk.ts (line 7)
  • Issue: The stack description says "This templates deploys" — should be "This template deploys".

5. Code Integrity & Quality

Finding 5.1: Unused random Import in Orchestrator

  • Severity: LOW
  • File: saga-pattern-cdk/lambda/saga-workflow/index.py (line 4)
  • Issue: The random module is imported but never used.
  • Recommended Fix: Remove import random.

Finding 5.2: datetime.utcnow() Deprecated in Python 3.12+

  • Severity: MEDIUM

  • File: All six service Lambda functions (reserve_flight.py, cancel_flight.py, reserve_hotel.py, cancel_hotel.py, reserve_car.py, cancel_car.py)

  • Issue: All service functions use datetime.utcnow() which is deprecated since Python 3.12. The pattern targets Python 3.14 runtime where this produces DeprecationWarning.

  • Current Code:

    'createdAt': datetime.utcnow().isoformat(),
    'updatedAt': datetime.utcnow().isoformat()
  • Recommended Fix:

    from datetime import datetime, timezone
    
    'createdAt': datetime.now(timezone.utc).isoformat(),
    'updatedAt': datetime.now(timezone.utc).isoformat()

Finding 5.3: Service Functions Return Error Responses Instead of Raising Exceptions

  • Severity: MEDIUM
  • File: reserve_flight.py, reserve_hotel.py, reserve_car.py
  • Issue: When the failBook* flag is set, service functions raise an Exception caught by the generic handler and returned as {statusCode: 500, body: ...}. The orchestrator then parses this response and checks statusCode != 200. Since these functions are invoked directly (not via API Gateway), they should let exceptions propagate for the durable SDK to handle.
  • Recommended Fix: For direct Lambda invocation, let exceptions propagate rather than wrapping them in API Gateway-style responses.

6. AWS Naming Conventions

Finding 6.1: Incorrect Capitalization in Stack Description

  • Severity: LOW

  • File: saga-pattern-cdk/bin/saga-pattern-cdk.ts (line 7)

  • Issue: The stack description uses "Lambda Durable Function" (capital D/F) and "SAGA" (all caps). Correct forms are "Lambda durable functions" and "saga pattern".

  • Recommended Fix:

    "This template deploys an AWS Lambda durable function workflow that implements the saga pattern"
    

Finding 6.2: Missing Full Service Name on First Reference

  • Severity: MEDIUM
  • File: README.md (lines 15-17)
  • Issue: "DynamoDB" used without "Amazon" prefix on first reference. Should be "Amazon DynamoDB".

Finding 6.3: Inconsistent Service Naming Throughout README

  • Severity: MEDIUM
  • File: README.md (multiple locations)
  • Issue: "CloudWatch" used without "Amazon" prefix on first reference (line 237). Should be "Amazon CloudWatch".

7. Architecture Analysis

Finding 7.1: No DLQ or OnFailure Destination on Durable Function

  • Severity: MEDIUM

  • File: saga-pattern-cdk/lib/saga-pattern-cdk-stack.ts (lines 172-195)

  • Issue: The saga orchestrator has no Dead Letter Queue or OnFailure destination. If the durable function exhausts retries or encounters an unrecoverable error, the failure is silently lost. For a saga orchestrator handling booking transactions, this could leave reservations in an inconsistent state.

  • Recommended Fix:

    const sagaDLQ = new aws_sqs.Queue(this, 'SagaDurableFunctionDLQ', {
      queueName: 'saga-durable-function-dlq',
      retentionPeriod: Duration.days(14),
    });
    
    const sagaDurableFunction = new aws_lambda.Function(this, 'SagaDurableFunction', {
      // ... existing config ...
      deadLetterQueue: sagaDLQ,
    });

8. Durable Functions Analysis

Finding 8.1: Non-Deterministic uuid.uuid4() Calls Outside Steps

  • Severity: CRITICAL

  • File: saga-pattern-cdk/lambda/saga-workflow/index.py (lines 35, 42, 44, 82, 100, 118)

  • Issue: Multiple uuid.uuid4() calls occur outside any context.step() or context.invoke() block. The handler re-executes from the beginning on every replay, so these UUID values will differ on each replay — causing orphaned records in DynamoDB and data inconsistency.

  • Current Code:

    transaction_id = str(uuid.uuid4())  # Different on every replay
    flight_data = {
        "bookingId": str(uuid.uuid4()),  # Different on every replay
        "flightNumber": event.get("flightNumber", f"FL{uuid.uuid4().hex[:6].upper()}"),
    }
    # Similar for hotel_data and car_data
  • Recommended Fix:

    @durable_execution
    def lambda_handler(event: dict, context: DurableContext) -> dict:
        transaction_id = context.step(lambda _: str(uuid.uuid4()), name='generate-transaction-id')
        flight_data = {
            "bookingId": context.step(lambda _: str(uuid.uuid4()), name='generate-booking-id'),
            # ...
        }

Finding 8.2: print() and context.logger Mixed Usage

  • Severity: MEDIUM

  • File: saga-pattern-cdk/lambda/saga-workflow/index.py (line 31)

  • Issue: print() in a durable function handler produces duplicate log entries on every replay. Additionally, datetime.datetime.now() inside the print is non-deterministic. The durable SDK provides context.logger which is replay-aware and deduplicates automatically.

  • Current Code:

    print(f"Saga workflow started at: {datetime.datetime.now()}")
  • Recommended Fix:

    context.logger.info("Saga workflow started")

Finding 8.3: datetime.datetime.now() and datetime.datetime.utcnow() Outside Steps

  • Severity: CRITICAL

  • File: saga-pattern-cdk/lambda/saga-workflow/index.py (lines 31, 86, 87, 104, 105)

  • Issue: datetime.datetime.now() and datetime.datetime.utcnow().date().isoformat() are called outside steps for hotel checkIn/checkOut and car pickup/dropoff defaults. These produce different values on each replay, potentially sending different default dates to service functions on retry.

  • Recommended Fix:

    defaults = context.step(lambda _: {
        "today": datetime.datetime.now(datetime.timezone.utc).date().isoformat()
    }, name='generate-defaults')
    
    hotel_data = {
        "checkIn": event.get("checkIn", defaults["today"]),
        "checkOut": event.get("checkOut", defaults["today"]),
    }

Finding 8.5: Compensation Pattern Could Use Child Context

  • Severity: HIGH

  • File: saga-pattern-cdk/lambda/saga-workflow/index.py (lines 145-190)

  • Issue: The compensation logic in the except block uses context.invoke() which is correct (it's a durable primitive that gets checkpointed). However, compensation_results is a local list rebuilt on every replay. The pattern would be cleaner and more maintainable wrapped in a child context.

  • Recommended Fix:

    # Consider wrapping the entire compensation sequence in a child context:
    # context.run_in_child_context(func=compensate, name='compensation')

Finding 8.6: Unused dynamodb Client at Module Scope

  • Severity: MEDIUM
  • File: saga-pattern-cdk/lambda/saga-workflow/index.py (line 12)
  • Issue: boto3.client('dynamodb') is instantiated at module scope but never used — the orchestrator delegates all DynamoDB operations to service functions via context.invoke(). Wastes a connection on cold start and adds confusion.
  • Recommended Fix: Remove dynamodb = boto3.client('dynamodb').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants