From b4ee71acc6a4733354e5fdec33997876c6a127c9 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 4 Jun 2026 00:25:59 -0400 Subject: [PATCH 01/26] feat(ecs_service): native ECS deployment strategies (rolling/blue_green/linear/canary) Replace the CodeDeploy/external-controller blue/green wiring with the ECS deployment controller's built-in strategies: - deployment_type now accepts rolling | blue_green | linear | canary; the deployment controller is always ECS (CODE_DEPLOY mapping removed) - new deployment_strategy_config seeds bake time + canary/linear tuning at create time; deployment_configuration is in ignore_changes because the Flightcontrol deploy manager passes the authoritative config (including pause lifecycle hooks) on every UpdateService call - native strategies wire load_balancer.advanced_configuration: alternate target group (tg-2), production listener rule (first ALB rule or the NLB listener), optional test_listener_rule_arn, and a new ECS infrastructure role (AmazonECSInfrastructureRolePolicyForLoadBalancers) - target groups tg-1/tg-2 are now production/alternate and gate on any native traffic-shift strategy, not just blue_green; outputs renamed accordingly (production_/alternate_target_group_*) and ecs_infrastructure_role_arn added - provider floor bumped to aws >= 6.21 (linear/canary configuration) Co-Authored-By: Claude Opus 4.8 (1M context) --- compute/ecs_service/README.md | 54 +++++++------- compute/ecs_service/ecs_service.tf | 70 +++++++++++++++-- compute/ecs_service/iam_infrastructure.tf | 40 ++++++++++ compute/ecs_service/listener_rules.tf | 4 +- compute/ecs_service/locals.tf | 18 ++++- compute/ecs_service/outputs.tf | 39 ++++++---- compute/ecs_service/target_groups.tf | 4 +- compute/ecs_service/tests/basic.tftest.hcl | 87 +++++++++++++++++++++- compute/ecs_service/variables.tf | 41 +++++++++- compute/ecs_service/versions.tf | 6 +- 10 files changed, 303 insertions(+), 60 deletions(-) create mode 100644 compute/ecs_service/iam_infrastructure.tf diff --git a/compute/ecs_service/README.md b/compute/ecs_service/README.md index 02a2f81..24c10ed 100644 --- a/compute/ecs_service/README.md +++ b/compute/ecs_service/README.md @@ -1,12 +1,12 @@ # ECS Service Module -This module creates an Amazon ECS service with a placeholder task definition, load balancer integration, auto scaling, and service discovery. It supports both rolling and blue/green deployment strategies. +This module creates an Amazon ECS service with a placeholder task definition, load balancer integration, auto scaling, and service discovery. It supports the native ECS deployment strategies: rolling, blue/green, linear, and canary. **Note:** This module provisions infrastructure with a placeholder container (hello-world). An external deployment controller (e.g. CodeDeploy or another CI/CD tool) is expected to deploy the actual application by updating the task definition. ## Features -- ECS service with configurable deployment strategies (rolling or blue/green) +- ECS service with configurable native deployment strategies (rolling, blue/green, linear, canary) - Placeholder task definition (hello-world) - the external deployment controller updates with the actual application - IAM roles for task execution and task roles with optional ECS Exec support - Security group for ECS tasks with configurable ingress rules @@ -314,7 +314,9 @@ module "worker_service" { | Name | Description | Type | Default | Required | |------|-------------|------|---------|----------| | desired_count | Desired number of tasks (0 for infrastructure-first) | `number` | `0` | no | -| deployment_type | Deployment type: rolling or blue_green | `string` | `"rolling"` | no | +| deployment_type | Deployment strategy: rolling, blue_green, linear, or canary | `string` | `"rolling"` | no | +| deployment_strategy_config | Initial bake/canary/linear tuning for native traffic-shift strategies (seed only — the deploy manager owns it per-deploy) | `object` | `{}` | no | +| test_listener_rule_arn | Optional ALB listener rule ARN for test traffic during blue/green validation | `string` | `null` | no | | deployment_minimum_healthy_percent | Minimum healthy percent during deployment | `number` | `100` | no | | deployment_maximum_percent | Maximum percent during deployment | `number` | `200` | no | | enable_execute_command | Enable ECS Exec for debugging | `bool` | `false` | no | @@ -823,25 +825,23 @@ The module deploys `public.ecr.aws/docker/library/hello-world:latest` as a place The placeholder container prints a message and exits, so load balancer health checks will fail until the actual application is deployed. This is expected behavior. -### When should I use rolling vs blue/green deployment? +### When should I use which deployment strategy? -| Feature | Rolling (ECS) | Blue/Green (external controller) | -|---------|--------------|----------------------------------| -| **Complexity** | Simple | More complex (requires an external deployment controller) | -| **Rollback** | Automatic via circuit breaker | Instant traffic switch | -| **Traffic shift** | Gradual (min/max healthy %) | All-at-once or gradual | -| **Testing** | No pre-production testing | Test green before switching | -| **Infrastructure** | 1 target group | 2 target groups | +All four strategies run on the native ECS deployment controller — no +CodeDeploy and no external controller. -**Use rolling when:** -- Simple deployments with automatic rollback are sufficient -- You want minimal infrastructure complexity -- Built-in ECS deployment features meet your needs +| Feature | Rolling | Blue/Green | Linear | Canary | +|---------|---------|------------|--------|--------| +| **Traffic shift** | Task replacement (min/max healthy %) | All-at-once + bake | Equal % steps + per-step bake | Small % first, then the rest | +| **Rollback** | Circuit breaker | Instant (old revision kept through bake) | Instant | Instant | +| **Testing** | None | Test-listener validation before shift | Per-step validation | Canary validation | +| **Infrastructure** | 1 target group | 2 target groups + infra role | 2 target groups + infra role | 2 target groups + infra role | -**Use blue/green when:** -- You need instant rollback capability -- You want to test in production before switching traffic -- You need advanced deployment strategies (canary, linear) +**Use rolling when:** simple deployments with automatic rollback are sufficient and you want minimal infrastructure. + +**Use blue/green when:** you want full validation of the new revision (optionally via a test listener rule) before shifting all production traffic at once, with instant rollback during the bake window. + +**Use linear/canary when:** you want production traffic to shift gradually with monitoring between steps. ### How do I use this module with an NLB instead of an ALB? @@ -1001,22 +1001,22 @@ Uses the ECS deployment controller for zero-downtime rolling updates: - Built-in circuit breaker with optional rollback - Simple and fully managed by ECS -### Blue/Green Deployment +### Native Traffic-Shift Strategies (blue_green / linear / canary) -Sets up infrastructure for blue/green deployments managed by an external controller: -- Creates two target groups (tg-1 and tg-2) -- Sets deployment controller to CODE_DEPLOY -- Outputs all ARNs needed to wire up the external controller -- The external deployment controller (application, deployment group, etc.) must be managed outside of this module +Sets up infrastructure for the ECS deployment controller's built-in traffic shifting: +- Creates two target groups (tg-1 = production, tg-2 = alternate) +- Creates an ECS infrastructure IAM role (AmazonECSInfrastructureRolePolicyForLoadBalancers) that ECS assumes to rewrite listener rules and (de)register targets during the shift +- Wires the service's `load_balancer.advanced_configuration` (alternate target group, production listener rule, optional test listener rule, infrastructure role) +- Seeds `deployment_configuration` (strategy, bake time, canary/linear tuning); the Flightcontrol deploy manager passes the authoritative configuration — including pause lifecycle hooks — on every UpdateService call, so the block is in `ignore_changes` ## Notes - The module creates a security group that allows inbound traffic from the VPC CIDR on the container port - For Fargate tasks in public subnets without NAT, set `assign_public_ip = true` - The placeholder container uses hello-world from public ECR - no special permissions needed -- For blue/green deployments, the module only creates the infrastructure; the external deployment controller must be configured separately +- For blue_green/linear/canary deployments, ECS itself executes the traffic shift; the Flightcontrol deploy manager drives it via UpdateService and pause lifecycle hooks - The task definition has `lifecycle { ignore_changes = all }` since the external deployment controller manages updates -- Listener rules have `lifecycle { ignore_changes = [action] }` for blue/green deployments where the external controller switches target groups +- Listener rules have `lifecycle { ignore_changes = [action] }` — the ECS deployment controller rewrites the forward action (weighted target groups) during native traffic shifts - When using `ALBRequestCountPerTarget` metric for auto scaling, a load balancer must be configured - The `desired_count` defaults to 0 for infrastructure-first provisioning; the external controller will manage the actual count - Target group names are truncated to meet AWS naming requirements (max 32 characters) diff --git a/compute/ecs_service/ecs_service.tf b/compute/ecs_service/ecs_service.tf index 5d1714e..a3f662e 100644 --- a/compute/ecs_service/ecs_service.tf +++ b/compute/ecs_service/ecs_service.tf @@ -40,7 +40,8 @@ resource "aws_ecs_service" "this" { type = local.deployment_controller_type } - # Deployment circuit breaker (only for ECS deployment controller) + # Deployment circuit breaker (rolling strategy only — native + # traffic-shift strategies have their own rollback semantics) dynamic "deployment_circuit_breaker" { for_each = var.deployment_type == "rolling" && var.deployment_circuit_breaker.enable ? [1] : [] content { @@ -49,6 +50,35 @@ resource "aws_ecs_service" "this" { } } + # Native deployment strategy. Seeds the strategy + traffic-shift + # tuning at create time; the Flightcontrol deploy manager passes the + # authoritative deploymentConfiguration (including pause lifecycle + # hooks) on every UpdateService call, so this block is in + # ignore_changes below. + dynamic "deployment_configuration" { + for_each = local.is_native_traffic_shift ? [1] : [] + content { + strategy = local.deployment_strategy + bake_time_in_minutes = var.deployment_strategy_config.bake_time_in_minutes + + dynamic "canary_configuration" { + for_each = var.deployment_type == "canary" ? [1] : [] + content { + canary_percent = var.deployment_strategy_config.canary.canary_percent + canary_bake_time_in_minutes = var.deployment_strategy_config.canary.canary_bake_time_in_minutes + } + } + + dynamic "linear_configuration" { + for_each = var.deployment_type == "linear" ? [1] : [] + content { + step_percent = var.deployment_strategy_config.linear.step_percent + step_bake_time_in_minutes = var.deployment_strategy_config.linear.step_bake_time_in_minutes + } + } + } + } + # Deployment min/max healthy percent deployment_minimum_healthy_percent = var.deployment_type == "rolling" ? var.deployment_minimum_healthy_percent : null deployment_maximum_percent = var.deployment_type == "rolling" ? var.deployment_maximum_percent : null @@ -63,13 +93,28 @@ resource "aws_ecs_service" "this" { } } - # Load balancer configuration - Blue/Green deployment (attach to blue initially) + # Load balancer configuration - native traffic-shift strategies. + # The service starts on the production target group (tg-1); ECS + # alternates between tg-1 and the alternate target group (tg-2) on + # each deployment, rewriting the production listener rule via the + # infrastructure role. dynamic "load_balancer" { - for_each = local.enable_load_balancer && var.deployment_type == "blue_green" ? [1] : [] + for_each = local.enable_load_balancer && local.is_native_traffic_shift ? [1] : [] content { target_group_arn = aws_lb_target_group.tg_1[0].arn container_name = local.lb_container_name container_port = local.lb_container_port + + advanced_configuration { + alternate_target_group_arn = aws_lb_target_group.tg_2[0].arn + production_listener_rule = ( + local.enable_nlb_listener + ? aws_lb_listener.nlb[0].arn + : aws_lb_listener_rule.alb["0"].arn + ) + test_listener_rule = var.test_listener_rule_arn + role_arn = aws_iam_role.ecs_infrastructure[0].arn + } } } @@ -106,17 +151,32 @@ resource "aws_ecs_service" "this" { # Dependencies depends_on = [ aws_iam_role_policy_attachment.execution_base, + aws_iam_role_policy_attachment.ecs_infrastructure_elb, aws_lb_listener_rule.alb, ] - # Lifecycle: desired_count is managed by autoscaling (or external controllers), - # so Terraform must not fight it on subsequent applies. + # Lifecycle: desired_count is managed by autoscaling, task_definition / + # load_balancer / deployment_configuration by the Flightcontrol deploy + # manager (UpdateService passes the authoritative strategy + pause + # lifecycle hooks on every deploy, and native traffic-shift deploys + # alternate the service between the production and alternate target + # groups), so Terraform must not fight them on subsequent applies. lifecycle { ignore_changes = [ desired_count, task_definition, load_balancer, + deployment_configuration, ] + + precondition { + condition = ( + !(local.is_native_traffic_shift && local.enable_load_balancer) + || local.enable_nlb_listener + || length(var.load_balancer_attachment.listener_rules) > 0 + ) + error_message = "Native traffic-shift strategies (blue_green/linear/canary) require either listener_rules (ALB) or nlb_listener so the production listener rule can be wired into advanced_configuration." + } } } diff --git a/compute/ecs_service/iam_infrastructure.tf b/compute/ecs_service/iam_infrastructure.tf new file mode 100644 index 0000000..da04736 --- /dev/null +++ b/compute/ecs_service/iam_infrastructure.tf @@ -0,0 +1,40 @@ +################################################################################ +# ECS Infrastructure Role +# +# Native traffic-shift deployments (blue_green / linear / canary) hand +# the load-balancer wiring to the ECS deployment controller: ECS assumes +# this role to register/deregister targets and rewrite the production / +# test listener rules while it shifts traffic between the production and +# alternate target groups. +################################################################################ + +data "aws_iam_policy_document" "ecs_infrastructure_assume" { + count = local.is_native_traffic_shift && local.enable_load_balancer ? 1 : 0 + + statement { + actions = ["sts:AssumeRole"] + + principals { + type = "Service" + identifiers = ["ecs.amazonaws.com"] + } + } +} + +resource "aws_iam_role" "ecs_infrastructure" { + count = local.is_native_traffic_shift && local.enable_load_balancer ? 1 : 0 + + name_prefix = "${substr(var.name, 0, min(length(var.name), 26))}-infra-" + assume_role_policy = data.aws_iam_policy_document.ecs_infrastructure_assume[0].json + + tags = merge(local.tags, { + Name = "${var.name}-ecs-infrastructure" + }) +} + +resource "aws_iam_role_policy_attachment" "ecs_infrastructure_elb" { + count = local.is_native_traffic_shift && local.enable_load_balancer ? 1 : 0 + + role = aws_iam_role.ecs_infrastructure[0].name + policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSInfrastructureRolePolicyForLoadBalancers" +} diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index 7d238a1..3343c44 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -14,7 +14,7 @@ resource "aws_lb_listener_rule" "alb" { action { type = "forward" target_group_arn = ( - var.deployment_type == "blue_green" + local.is_native_traffic_shift ? aws_lb_target_group.tg_1[0].arn : aws_lb_target_group.this[0].arn ) @@ -108,7 +108,7 @@ resource "aws_lb_listener" "nlb" { default_action { type = "forward" target_group_arn = ( - var.deployment_type == "blue_green" + local.is_native_traffic_shift ? aws_lb_target_group.tg_1[0].arn : aws_lb_target_group.this[0].arn ) diff --git a/compute/ecs_service/locals.tf b/compute/ecs_service/locals.tf index c0fe0ad..fc09418 100644 --- a/compute/ecs_service/locals.tf +++ b/compute/ecs_service/locals.tf @@ -15,8 +15,22 @@ locals { tags = merge(local.default_tags, var.tags) - # Determine deployment controller type - deployment_controller_type = var.deployment_type == "blue_green" ? "CODE_DEPLOY" : "ECS" + # Every strategy runs on the native ECS deployment controller — the + # blue_green / linear / canary traffic shifts are executed by ECS + # itself (deployment_configuration.strategy), not CodeDeploy. + deployment_controller_type = "ECS" + + # Strategies that run the ECS controller's traffic-shift state machine + # over two target groups (production + alternate). + is_native_traffic_shift = contains(["blue_green", "linear", "canary"], var.deployment_type) + + # Map the module's strategy name to the AWS deploymentConfiguration enum. + deployment_strategy = { + rolling = "ROLLING" + blue_green = "BLUE_GREEN" + linear = "LINEAR" + canary = "CANARY" + }[var.deployment_type] # Determine if load balancer is configured enable_load_balancer = var.load_balancer_attachment != null && var.load_balancer_attachment.enabled diff --git a/compute/ecs_service/outputs.tf b/compute/ecs_service/outputs.tf index cba7ce9..670856d 100644 --- a/compute/ecs_service/outputs.tf +++ b/compute/ecs_service/outputs.tf @@ -104,27 +104,27 @@ output "target_group_name" { } ################################################################################ -# Target Groups - Blue/Green Deployment +# Target Groups - Native Traffic-Shift Strategies (blue_green/linear/canary) ################################################################################ -output "blue_target_group_arn" { - description = "The ARN of the blue target group (null if not blue/green deployment)." - value = local.enable_load_balancer && var.deployment_type == "blue_green" ? aws_lb_target_group.tg_1[0].arn : null +output "production_target_group_arn" { + description = "The ARN of the production target group (null unless a native traffic-shift strategy)." + value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_1[0].arn : null } -output "blue_target_group_name" { - description = "The name of the blue target group." - value = local.enable_load_balancer && var.deployment_type == "blue_green" ? aws_lb_target_group.tg_1[0].name : null +output "production_target_group_name" { + description = "The name of the production target group." + value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_1[0].name : null } -output "green_target_group_arn" { - description = "The ARN of the green target group (null if not blue/green deployment)." - value = local.enable_load_balancer && var.deployment_type == "blue_green" ? aws_lb_target_group.tg_2[0].arn : null +output "alternate_target_group_arn" { + description = "The ARN of the alternate target group ECS shifts traffic to during native deployments (null unless a native traffic-shift strategy)." + value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_2[0].arn : null } -output "green_target_group_name" { - description = "The name of the green target group." - value = local.enable_load_balancer && var.deployment_type == "blue_green" ? aws_lb_target_group.tg_2[0].name : null +output "alternate_target_group_name" { + description = "The name of the alternate target group." + value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_2[0].name : null } ################################################################################ @@ -137,12 +137,21 @@ output "target_group_arns" { var.deployment_type == "rolling" ? { primary = aws_lb_target_group.this[0].arn } : { - blue = aws_lb_target_group.tg_1[0].arn - green = aws_lb_target_group.tg_2[0].arn + production = aws_lb_target_group.tg_1[0].arn + alternate = aws_lb_target_group.tg_2[0].arn } ) : {} } +################################################################################ +# ECS Infrastructure Role +################################################################################ + +output "ecs_infrastructure_role_arn" { + description = "The ARN of the IAM role ECS assumes to manage load-balancer wiring during native traffic-shift deployments (null for rolling)." + value = local.is_native_traffic_shift && local.enable_load_balancer ? aws_iam_role.ecs_infrastructure[0].arn : null +} + ################################################################################ # Listeners ################################################################################ diff --git a/compute/ecs_service/target_groups.tf b/compute/ecs_service/target_groups.tf index 44a3bdf..d138660 100644 --- a/compute/ecs_service/target_groups.tf +++ b/compute/ecs_service/target_groups.tf @@ -50,7 +50,7 @@ resource "aws_lb_target_group" "this" { ################################################################################ resource "aws_lb_target_group" "tg_1" { - count = local.enable_load_balancer && var.deployment_type == "blue_green" ? 1 : 0 + count = local.enable_load_balancer && local.is_native_traffic_shift ? 1 : 0 name = "${substr(var.name, 0, min(length(var.name), 24))}-tg-1" port = var.load_balancer_attachment.target_group.port @@ -94,7 +94,7 @@ resource "aws_lb_target_group" "tg_1" { } resource "aws_lb_target_group" "tg_2" { - count = local.enable_load_balancer && var.deployment_type == "blue_green" ? 1 : 0 + count = local.enable_load_balancer && local.is_native_traffic_shift ? 1 : 0 name = "${substr(var.name, 0, min(length(var.name), 24))}-tg-2" port = var.load_balancer_attachment.target_group.port diff --git a/compute/ecs_service/tests/basic.tftest.hcl b/compute/ecs_service/tests/basic.tftest.hcl index 4216a22..0087859 100644 --- a/compute/ecs_service/tests/basic.tftest.hcl +++ b/compute/ecs_service/tests/basic.tftest.hcl @@ -153,18 +153,101 @@ run "blue_green_deployment" { assert { condition = length(aws_lb_target_group.tg_1) == 1 - error_message = "Should create blue target group for blue/green deployment" + error_message = "Should create production target group for blue/green deployment" } assert { condition = length(aws_lb_target_group.tg_2) == 1 - error_message = "Should create green target group for blue/green deployment" + error_message = "Should create alternate target group for blue/green deployment" } assert { condition = length(aws_lb_target_group.this) == 0 error_message = "Should not create single target group for blue/green deployment" } + + assert { + condition = length(aws_iam_role.ecs_infrastructure) == 1 + error_message = "Should create the ECS infrastructure role for native traffic-shift strategies" + } +} + +################################################################################ +# Test: Canary Deployment +################################################################################ + +run "canary_deployment" { + command = plan + + variables { + deployment_type = "canary" + container_port = 8080 + deployment_strategy_config = { + bake_time_in_minutes = 15 + canary = { + canary_percent = 10.0 + canary_bake_time_in_minutes = 5 + } + } + load_balancer_attachment = { + target_group = { + port = 8080 + protocol = "HTTP" + } + listener_rules = [{ + listener_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-alb/1234567890123456/1234567890123456" + priority = 100 + conditions = [{ + type = "host-header" + values = ["api.example.com"] + }] + }] + } + } + + assert { + condition = length(aws_lb_target_group.tg_1) == 1 && length(aws_lb_target_group.tg_2) == 1 + error_message = "Should create production + alternate target groups for canary deployment" + } +} + +################################################################################ +# Test: Linear Deployment +################################################################################ + +run "linear_deployment" { + command = plan + + variables { + deployment_type = "linear" + container_port = 8080 + deployment_strategy_config = { + bake_time_in_minutes = 10 + linear = { + step_percent = 20.0 + step_bake_time_in_minutes = 5 + } + } + load_balancer_attachment = { + target_group = { + port = 8080 + protocol = "HTTP" + } + listener_rules = [{ + listener_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-alb/1234567890123456/1234567890123456" + priority = 100 + conditions = [{ + type = "host-header" + values = ["api.example.com"] + }] + }] + } + } + + assert { + condition = length(aws_lb_target_group.tg_1) == 1 && length(aws_lb_target_group.tg_2) == 1 + error_message = "Should create production + alternate target groups for linear deployment" + } } ################################################################################ diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index 2677e5a..b66603a 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -286,15 +286,50 @@ variable "desired_count" { variable "deployment_type" { type = string - description = "The deployment type: 'rolling' (ECS) or 'blue_green' (CODE_DEPLOY)." + description = "The deployment strategy executed natively by the ECS deployment controller: 'rolling', 'blue_green', 'linear', or 'canary'." default = "rolling" validation { - condition = contains(["rolling", "blue_green"], var.deployment_type) - error_message = "The deployment_type must be either 'rolling' or 'blue_green'." + condition = contains(["rolling", "blue_green", "linear", "canary"], var.deployment_type) + error_message = "The deployment_type must be one of: 'rolling', 'blue_green', 'linear', 'canary'." } } +variable "deployment_strategy_config" { + type = object({ + # Minutes both revisions keep running after production traffic has + # fully shifted, before the old revision is terminated. + bake_time_in_minutes = optional(number, 10) + + # Canary tuning — only used when deployment_type is 'canary'. + canary = optional(object({ + canary_percent = optional(number, 5.0) + canary_bake_time_in_minutes = optional(number, 10) + }), {}) + + # Linear tuning — only used when deployment_type is 'linear'. + linear = optional(object({ + step_percent = optional(number, 25.0) + step_bake_time_in_minutes = optional(number, 5) + }), {}) + }) + description = <<-EOT + Initial tuning for the native traffic-shift strategies (blue_green / + linear / canary). This only seeds the service at create time — the + Flightcontrol deploy manager passes the authoritative + deploymentConfiguration (including pause lifecycle hooks) on every + UpdateService call, so post-create changes to these values are + ignored by Terraform (see ignore_changes on aws_ecs_service.this). + EOT + default = {} +} + +variable "test_listener_rule_arn" { + type = string + description = "Optional ARN of an ALB listener rule that routes test traffic for blue/green validation (drives the TEST_TRAFFIC_SHIFT lifecycle stages). Only used for native traffic-shift strategies." + default = null +} + variable "deployment_minimum_healthy_percent" { type = number description = "The minimum healthy percent during deployment (rolling deployments only)." diff --git a/compute/ecs_service/versions.tf b/compute/ecs_service/versions.tf index bec739b..283914b 100644 --- a/compute/ecs_service/versions.tf +++ b/compute/ecs_service/versions.tf @@ -9,8 +9,10 @@ terraform { required_providers { aws = { - source = "hashicorp/aws" - version = ">= 6.0" + source = "hashicorp/aws" + # 6.21 adds linear_configuration / canary_configuration on the + # aws_ecs_service deployment_configuration block. + version = ">= 6.21" } } } From 3c72a84a348a0d7d21b6927497834c82e8af258d Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 4 Jun 2026 11:03:43 -0400 Subject: [PATCH 02/26] feat(ecs_service): make deployment strategy a per-deployment decision MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Always provision the production/alternate target-group pair, the ECS infrastructure role, and the service's load_balancer advanced_configuration whenever a load balancer is attached — not just for native traffic-shift deployment_types. Rolling deployments serve from the production target group only, so any service can now switch between rolling / blue_green / linear / canary on a single UpdateService call with zero Terraform changes; deployment_type is purely the create-time seed. Also fixes the module test suite: mock the iam_policy_document / partition / region / vpc data sources and computed ARNs so all 11 runs execute (basic_service was failing on main and skipping the rest). Co-Authored-By: Claude Opus 4.8 (1M context) --- compute/ecs_service/README.md | 98 +++++++++++----------- compute/ecs_service/auto_scaling.tf | 4 +- compute/ecs_service/ecs_service.tf | 26 ++---- compute/ecs_service/iam_infrastructure.tf | 11 ++- compute/ecs_service/listener_rules.tf | 35 ++++---- compute/ecs_service/locals.tf | 6 +- compute/ecs_service/outputs.tf | 56 +++++-------- compute/ecs_service/target_groups.tf | 62 +++----------- compute/ecs_service/tests/basic.tftest.hcl | 89 ++++++++++++++++---- compute/ecs_service/variables.tf | 2 +- 10 files changed, 196 insertions(+), 193 deletions(-) diff --git a/compute/ecs_service/README.md b/compute/ecs_service/README.md index 24c10ed..fd0f0b1 100644 --- a/compute/ecs_service/README.md +++ b/compute/ecs_service/README.md @@ -2,7 +2,9 @@ This module creates an Amazon ECS service with a placeholder task definition, load balancer integration, auto scaling, and service discovery. It supports the native ECS deployment strategies: rolling, blue/green, linear, and canary. -**Note:** This module provisions infrastructure with a placeholder container (hello-world). An external deployment controller (e.g. CodeDeploy or another CI/CD tool) is expected to deploy the actual application by updating the task definition. +**Note:** This module provisions infrastructure with a placeholder container (hello-world). The Flightcontrol deploy manager deploys the actual application by registering task definitions and calling UpdateService with the authoritative `deploymentConfiguration` (strategy, bake times, pause lifecycle hooks) on every deploy. + +When a load balancer is attached, the module always provisions the production + alternate target-group pair, the ECS infrastructure role, and the service's `load_balancer.advanced_configuration` — so the deployment strategy is a **per-deployment decision**: any service can switch between rolling / blue_green / linear / canary on a single deploy with no Terraform changes. `deployment_type` only seeds the strategy at create time. ## Features @@ -15,7 +17,7 @@ This module creates an Amazon ECS service with a placeholder task definition, lo - NLB listener creation with TLS support - Application Auto Scaling with target tracking and scheduled scaling - AWS Cloud Map service discovery integration -- Blue/green deployment infrastructure (managed by an external deployment controller) +- Native traffic-shift deployment infrastructure (production/alternate target groups, ECS infrastructure role, advanced_configuration) provisioned for every load-balanced service so the strategy can change per deployment - Support for EFS and Docker volume configurations - Capacity provider strategy support for mixed Fargate/EC2 deployments @@ -114,9 +116,10 @@ module "api_service" { } } -# Use the outputs to configure an external deployment controller -# module.api_service.blue_target_group_arn -# module.api_service.green_target_group_arn +# Target groups + ECS infrastructure role for the traffic shift: +# module.api_service.production_target_group_arn +# module.api_service.alternate_target_group_arn +# module.api_service.ecs_infrastructure_role_arn ``` ### With Service Discovery @@ -314,7 +317,7 @@ module "worker_service" { | Name | Description | Type | Default | Required | |------|-------------|------|---------|----------| | desired_count | Desired number of tasks (0 for infrastructure-first) | `number` | `0` | no | -| deployment_type | Deployment strategy: rolling, blue_green, linear, or canary | `string` | `"rolling"` | no | +| deployment_type | Create-time seed for the deployment strategy (rolling, blue_green, linear, canary); the strategy itself is set per deployment via UpdateService | `string` | `"rolling"` | no | | deployment_strategy_config | Initial bake/canary/linear tuning for native traffic-shift strategies (seed only — the deploy manager owns it per-deploy) | `object` | `{}` | no | | test_listener_rule_arn | Optional ALB listener rule ARN for test traffic during blue/green validation | `string` | `null` | no | | deployment_minimum_healthy_percent | Minimum healthy percent during deployment | `number` | `100` | no | @@ -407,23 +410,20 @@ The `service_discovery` object includes: | security_group_id | The ID of the service security group | | security_group_arn | The ARN of the service security group | -### Target Groups - Rolling Deployment - -| Name | Description | -|------|-------------| -| target_group_arn | Target group ARN (null if LB disabled or blue/green) | -| target_group_arn_suffix | Target group ARN suffix for CloudWatch metrics | -| target_group_name | Target group name | +### Target Groups -### Target Groups - Blue/Green Deployment +A production (tg-1) + alternate (tg-2) pair always exists when a load balancer is attached. Rolling deployments only ever serve from the production target group; native traffic-shift deployments alternate between the two. | Name | Description | |------|-------------| -| blue_target_group_arn | Blue target group ARN | -| blue_target_group_name | Blue target group name | -| green_target_group_arn | Green target group ARN | -| green_target_group_name | Green target group name | -| target_group_arns | Map of all target group ARNs (primary for rolling, blue/green for blue_green) | +| production_target_group_arn | Production target group ARN (null if LB disabled) | +| production_target_group_name | Production target group name | +| alternate_target_group_arn | Alternate target group ARN ECS shifts traffic to during native deployments | +| alternate_target_group_name | Alternate target group name | +| target_group_arn | Alias of production_target_group_arn | +| target_group_arn_suffix | Production target group ARN suffix for CloudWatch metrics | +| target_group_arns | Map of all target group ARNs (production + alternate) | +| ecs_infrastructure_role_arn | IAM role ECS assumes to manage listener wiring during native traffic-shift deployments | ### NLB Listener @@ -527,7 +527,7 @@ The `service_discovery` object includes: ║ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ ║ ║ │ │ • default_tags = { ManagedBy = "terraform", Module = "compute/ecs_service" } │ │ ║ ║ │ │ • tags = merge(default_tags, var.tags) │ │ ║ -║ │ │ • deployment_controller_type = var.deployment_type == "blue_green" ? "CODE_DEPLOY" : "ECS" │ │ ║ +║ │ │ • deployment_controller_type = "ECS" (always; strategy is per-deployment) │ │ ║ ║ │ │ • placeholder_container_name = "app" │ │ ║ ║ │ │ │ │ ║ ║ │ │ FEATURE FLAGS: │ │ ║ @@ -543,7 +543,7 @@ The `service_discovery` object includes: ║ ┌─────────────────────────────┐ ┌─────────────────────────────────┐ ┌─────────────────────────────────────────┐ ║ ║ │ TASK DEFINITION │ │ SERVICE CONFIG │ │ DEPLOYMENT │ ║ ║ ├─────────────────────────────┤ ├─────────────────────────────────┤ ├─────────────────────────────────────────┤ ║ -║ │ • task_cpu │ │ • desired_count │ │ • deployment_type (rolling/blue_green) │ ║ +║ │ • task_cpu │ │ • desired_count │ │ • deployment_type (strategy seed) │ ║ ║ │ • task_memory │ │ • enable_execute_command │ │ • deployment_minimum_healthy_percent │ ║ ║ │ • container_port │ │ • force_new_deployment │ │ • deployment_maximum_percent │ ║ ║ │ • launch_type │ │ • wait_for_steady_state │ │ • deployment_circuit_breaker │ ║ @@ -636,7 +636,7 @@ The `service_discovery` object includes: ║ │ │ _breaker(dynamic)│ │ strategy (dynamic)│ │ ║ ║ │ └──────────────────┘ └───────────────────┘ │ ║ ║ │ │ ║ -║ │ deployment_controller.type = ECS | CODE_DEPLOY │ ║ +║ │ deployment_controller.type = ECS (always) │ ║ ║ └────────────────────────────────────┬─────────────────────────────────────┘ ║ ║ │ ║ ║ ┌─────────────────────────────────────────┬───────────────────────┼───────────────────────┬───────────────┐ ║ @@ -646,15 +646,15 @@ The `service_discovery` object includes: ║ │ TARGET GROUPS │ │ aws_lb_listener_rule.alb │ │ aws_lb_listener │ │aws_service_discovery │ ║ ║ │ (conditional) │ │ (for_each: listener_rules) │ │ .nlb[0] │ │ _service.this[0] │ ║ ║ ├───────────────────────┤ ├───────────────────────────────┤ │ (count: 0 or 1) │ │(count: 0 or 1) │ ║ -║ │ Rolling: │ │ • path-pattern condition │ ├──────────────────┤ ├────────────────────────┤ ║ +║ │ Always (when LB): │ │ • path-pattern condition │ ├──────────────────┤ ├────────────────────────┤ ║ ║ │ aws_lb_target_group │ │ • host-header condition │ │ • TCP/TLS/UDP │ │ • Cloud Map DNS │ ║ -║ │ .this[0] │ │ • http-header condition │ │ • Certificate │ │ • A or SRV records │ ║ -║ │ │ │ • query-string condition │ │ • SSL policy │ │ • Custom health check │ ║ -║ │ Blue/Green: │ │ • source-ip condition │ └──────────────────┘ └────────────────────────┘ ║ -║ │ aws_lb_target_group │ │ lifecycle: ignore action │ ║ -║ │ .tg_1[0] (blue) │ │ (external controller swaps) │ ║ -║ │ aws_lb_target_group │ └───────────────────────────────┘ ║ -║ │ .tg_2[0] (green) │ ║ +║ │ .tg_1[0] (prod) │ │ • http-header condition │ │ • Certificate │ │ • A or SRV records │ ║ +║ │ aws_lb_target_group │ │ • query-string condition │ │ • SSL policy │ │ • Custom health check │ ║ +║ │ .tg_2[0] (alt) │ │ • source-ip condition │ └──────────────────┘ └────────────────────────┘ ║ +║ │ │ │ lifecycle: ignore action │ ║ +║ │ │ │ (ECS controller rewrites) │ ║ +║ │ │ └───────────────────────────────┘ ║ +║ │ │ ║ ║ └───────────────────────┘ ║ ║ ║ ║ ┌─────────────────────────────────────────────────────────────────────────────────────┐ ║ @@ -701,13 +701,13 @@ The `service_discovery` object includes: ║ └─────────────────────────────────────────┘ ║ ║ ║ ║ ┌─────────────────────────────────────────┐ ┌─────────────────────────────────────────┐ ║ -║ │ TARGET GROUPS (Rolling) │ │ TARGET GROUPS (Blue/Green) │ ║ +║ │ TARGET GROUPS (always w/ LB) │ │ TRAFFIC-SHIFT INFRA │ ║ ║ ├─────────────────────────────────────────┤ ├─────────────────────────────────────────┤ ║ -║ │ • target_group_arn │ │ • blue_target_group_arn │ ║ -║ │ • target_group_arn_suffix │ │ • blue_target_group_name │ ║ -║ │ • target_group_name │ │ • green_target_group_arn │ ║ -║ └─────────────────────────────────────────┘ │ • green_target_group_name │ ║ -║ │ • target_group_arns (map) │ ║ +║ │ • production_target_group_arn │ │ • alternate_target_group_arn │ ║ +║ │ • production_target_group_name │ │ • alternate_target_group_name │ ║ +║ │ • target_group_arn (alias) │ │ • ecs_infrastructure_role_arn │ ║ +║ └─────────────────────────────────────────┘ │ • target_group_arns (map) │ ║ +║ │ │ ║ ║ └─────────────────────────────────────────┘ ║ ║ ║ ║ ┌─────────────────────────────────────────┐ ┌─────────────────────────────────────────┐ ║ @@ -779,8 +779,8 @@ The `service_discovery` object includes: ║ │ │ │ │ │ ║ ║ ▼ ▼ ▼ ▼ ▼ ║ ║ aws_lb_target_group aws_lb_listener_rule aws_lb_listener aws_appautoscaling_ aws_service_discovery_ ║ -║ .this[0] / .tg_1[0] .alb (for_each) .nlb[0] target.this[0] service.this[0] ║ -║ / .tg_2[0] │ ║ +║ .tg_1[0] + .tg_2[0] .alb (for_each) .nlb[0] target.this[0] service.this[0] ║ +║ │ ║ ║ │ ║ ║ ┌───────────────────────────────────────────┴───────────────────────────┐ ║ ║ │ │ ║ @@ -803,9 +803,9 @@ The `service_discovery` object includes: | `aws_ecs_task_definition` | 1 | Container configuration (placeholder) | | `aws_ecs_service` | 1 | Core ECS service resource | | `module.security_group` | 1 | Security group for tasks | -| `aws_lb_target_group.this` | 0 or 1 | Target group for rolling deployment | -| `aws_lb_target_group.tg_1` | 0 or 1 | Blue target group for blue/green | -| `aws_lb_target_group.tg_2` | 0 or 1 | Green target group for blue/green | +| `aws_lb_target_group.tg_1` | 0 or 1 | Production target group (created whenever a load balancer is attached) | +| `aws_lb_target_group.tg_2` | 0 or 1 | Alternate target group ECS shifts traffic to during native deployments | +| `aws_iam_role.ecs_infrastructure` | 0 or 1 | Role ECS assumes for load-balancer wiring during traffic shifts | | `aws_lb_listener_rule.alb` | for_each | ALB listener rules | | `aws_lb_listener.nlb` | 0 or 1 | NLB listener | | `aws_service_discovery_service` | 0 or 1 | Cloud Map service | @@ -830,14 +830,16 @@ The placeholder container prints a message and exits, so load balancer health ch All four strategies run on the native ECS deployment controller — no CodeDeploy and no external controller. +The same infrastructure (2 target groups + infrastructure role) backs every load-balanced service, so any service can switch strategy on its next deployment. + | Feature | Rolling | Blue/Green | Linear | Canary | |---------|---------|------------|--------|--------| | **Traffic shift** | Task replacement (min/max healthy %) | All-at-once + bake | Equal % steps + per-step bake | Small % first, then the rest | | **Rollback** | Circuit breaker | Instant (old revision kept through bake) | Instant | Instant | | **Testing** | None | Test-listener validation before shift | Per-step validation | Canary validation | -| **Infrastructure** | 1 target group | 2 target groups + infra role | 2 target groups + infra role | 2 target groups + infra role | +| **Target groups used** | Production only | Both | Both | Both | -**Use rolling when:** simple deployments with automatic rollback are sufficient and you want minimal infrastructure. +**Use rolling when:** simple deployments with automatic rollback are sufficient. **Use blue/green when:** you want full validation of the new revision (optionally via a test listener rule) before shifting all production traffic at once, with instant rollback during the bake window. @@ -1003,11 +1005,11 @@ Uses the ECS deployment controller for zero-downtime rolling updates: ### Native Traffic-Shift Strategies (blue_green / linear / canary) -Sets up infrastructure for the ECS deployment controller's built-in traffic shifting: -- Creates two target groups (tg-1 = production, tg-2 = alternate) -- Creates an ECS infrastructure IAM role (AmazonECSInfrastructureRolePolicyForLoadBalancers) that ECS assumes to rewrite listener rules and (de)register targets during the shift -- Wires the service's `load_balancer.advanced_configuration` (alternate target group, production listener rule, optional test listener rule, infrastructure role) -- Seeds `deployment_configuration` (strategy, bake time, canary/linear tuning); the Flightcontrol deploy manager passes the authoritative configuration — including pause lifecycle hooks — on every UpdateService call, so the block is in `ignore_changes` +The infrastructure for the ECS deployment controller's built-in traffic shifting is provisioned for **every** load-balanced service — not just those created with a native `deployment_type` — so the strategy can change between deployments without Terraform changes: +- Two target groups (tg-1 = production, tg-2 = alternate); rolling deployments only ever use tg-1 +- An ECS infrastructure IAM role (AmazonECSInfrastructureRolePolicyForLoadBalancers) that ECS assumes to rewrite listener rules and (de)register targets during the shift +- The service's `load_balancer.advanced_configuration` (alternate target group, production listener rule, optional test listener rule, infrastructure role) +- `deployment_configuration` is seeded from `deployment_type` / `deployment_strategy_config` at create time only; the Flightcontrol deploy manager passes the authoritative configuration — including pause lifecycle hooks — on every UpdateService call, so the block is in `ignore_changes` ## Notes diff --git a/compute/ecs_service/auto_scaling.tf b/compute/ecs_service/auto_scaling.tf index a52651b..5e3bffa 100644 --- a/compute/ecs_service/auto_scaling.tf +++ b/compute/ecs_service/auto_scaling.tf @@ -96,9 +96,7 @@ resource "aws_appautoscaling_scheduled_action" "this" { ################################################################################ locals { - primary_target_group_arn_suffix = local.enable_load_balancer ? ( - var.deployment_type == "rolling" ? aws_lb_target_group.this[0].arn_suffix : aws_lb_target_group.tg_1[0].arn_suffix - ) : "" + primary_target_group_arn_suffix = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn_suffix : "" } diff --git a/compute/ecs_service/ecs_service.tf b/compute/ecs_service/ecs_service.tf index a3f662e..3ea366b 100644 --- a/compute/ecs_service/ecs_service.tf +++ b/compute/ecs_service/ecs_service.tf @@ -83,23 +83,15 @@ resource "aws_ecs_service" "this" { deployment_minimum_healthy_percent = var.deployment_type == "rolling" ? var.deployment_minimum_healthy_percent : null deployment_maximum_percent = var.deployment_type == "rolling" ? var.deployment_maximum_percent : null - # Load balancer configuration - Rolling deployment - dynamic "load_balancer" { - for_each = local.enable_load_balancer && var.deployment_type == "rolling" ? [1] : [] - content { - target_group_arn = aws_lb_target_group.this[0].arn - container_name = local.lb_container_name - container_port = local.lb_container_port - } - } - - # Load balancer configuration - native traffic-shift strategies. - # The service starts on the production target group (tg-1); ECS - # alternates between tg-1 and the alternate target group (tg-2) on - # each deployment, rewriting the production listener rule via the + # Load balancer configuration. advanced_configuration is always wired + # (production + alternate target groups, listener rule, infrastructure + # role) so the deployment strategy stays a per-deployment decision: + # rolling deployments serve from the production target group (tg-1) + # only, while native traffic-shift deployments alternate between tg-1 + # and tg-2, rewriting the production listener rule via the # infrastructure role. dynamic "load_balancer" { - for_each = local.enable_load_balancer && local.is_native_traffic_shift ? [1] : [] + for_each = local.enable_load_balancer ? [1] : [] content { target_group_arn = aws_lb_target_group.tg_1[0].arn container_name = local.lb_container_name @@ -171,11 +163,11 @@ resource "aws_ecs_service" "this" { precondition { condition = ( - !(local.is_native_traffic_shift && local.enable_load_balancer) + !local.enable_load_balancer || local.enable_nlb_listener || length(var.load_balancer_attachment.listener_rules) > 0 ) - error_message = "Native traffic-shift strategies (blue_green/linear/canary) require either listener_rules (ALB) or nlb_listener so the production listener rule can be wired into advanced_configuration." + error_message = "load_balancer_attachment requires either listener_rules (ALB) or nlb_listener so the production listener rule can be wired into advanced_configuration." } } } diff --git a/compute/ecs_service/iam_infrastructure.tf b/compute/ecs_service/iam_infrastructure.tf index da04736..cec0d48 100644 --- a/compute/ecs_service/iam_infrastructure.tf +++ b/compute/ecs_service/iam_infrastructure.tf @@ -6,10 +6,15 @@ # this role to register/deregister targets and rewrite the production / # test listener rules while it shifts traffic between the production and # alternate target groups. +# +# Created whenever a load balancer is attached (not just for native +# strategies) so the deploy manager can switch any service to a +# traffic-shift strategy on a per-deployment basis without a Terraform +# apply. Rolling deployments never cause ECS to assume it. ################################################################################ data "aws_iam_policy_document" "ecs_infrastructure_assume" { - count = local.is_native_traffic_shift && local.enable_load_balancer ? 1 : 0 + count = local.enable_load_balancer ? 1 : 0 statement { actions = ["sts:AssumeRole"] @@ -22,7 +27,7 @@ data "aws_iam_policy_document" "ecs_infrastructure_assume" { } resource "aws_iam_role" "ecs_infrastructure" { - count = local.is_native_traffic_shift && local.enable_load_balancer ? 1 : 0 + count = local.enable_load_balancer ? 1 : 0 name_prefix = "${substr(var.name, 0, min(length(var.name), 26))}-infra-" assume_role_policy = data.aws_iam_policy_document.ecs_infrastructure_assume[0].json @@ -33,7 +38,7 @@ resource "aws_iam_role" "ecs_infrastructure" { } resource "aws_iam_role_policy_attachment" "ecs_infrastructure_elb" { - count = local.is_native_traffic_shift && local.enable_load_balancer ? 1 : 0 + count = local.enable_load_balancer ? 1 : 0 role = aws_iam_role.ecs_infrastructure[0].name policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSInfrastructureRolePolicyForLoadBalancers" diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index 3343c44..9820057 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -1,6 +1,10 @@ ################################################################################ # ALB Listener Rules -# For blue/green deployments, an external controller manages target group switching +# +# Rules initially forward to the production target group (tg-1). During +# native traffic-shift deployments (blue_green/linear/canary) the ECS +# deployment controller rewrites the rule's forward action between tg-1 +# and tg-2 via the infrastructure role, hence ignore_changes on action. ################################################################################ resource "aws_lb_listener_rule" "alb" { @@ -12,12 +16,8 @@ resource "aws_lb_listener_rule" "alb" { priority = each.value.priority action { - type = "forward" - target_group_arn = ( - local.is_native_traffic_shift - ? aws_lb_target_group.tg_1[0].arn - : aws_lb_target_group.this[0].arn - ) + type = "forward" + target_group_arn = aws_lb_target_group.tg_1[0].arn } dynamic "condition" { @@ -80,8 +80,8 @@ resource "aws_lb_listener_rule" "alb" { Name = "${var.name}-rule-${each.key}" }) - # Ignore changes to action as the external deployment controller manages target group switching for blue/green - # This is a no-op for rolling deployments (nothing external modifies the action) + # The ECS deployment controller rewrites the forward action during + # native traffic-shift deployments; a no-op for rolling deployments. lifecycle { ignore_changes = [action] } @@ -89,8 +89,9 @@ resource "aws_lb_listener_rule" "alb" { ################################################################################ # NLB Listeners -# For NLB, we create the listener directly (no listener rules in NLB) -# For blue/green deployments, an external controller manages target group switching +# For NLB, we create the listener directly (no listener rules in NLB). +# The ECS deployment controller rewrites the default action during +# native traffic-shift deployments. ################################################################################ resource "aws_lb_listener" "nlb" { @@ -106,20 +107,16 @@ resource "aws_lb_listener" "nlb" { alpn_policy = var.load_balancer_attachment.nlb_listener.protocol == "TLS" ? var.load_balancer_attachment.nlb_listener.alpn_policy : null default_action { - type = "forward" - target_group_arn = ( - local.is_native_traffic_shift - ? aws_lb_target_group.tg_1[0].arn - : aws_lb_target_group.this[0].arn - ) + type = "forward" + target_group_arn = aws_lb_target_group.tg_1[0].arn } tags = merge(local.tags, { Name = "${var.name}-nlb-listener" }) - # Ignore changes to default_action as the external deployment controller manages target group switching for blue/green - # This is a no-op for rolling deployments (nothing external modifies the action) + # The ECS deployment controller rewrites the default action during + # native traffic-shift deployments; a no-op for rolling deployments. lifecycle { ignore_changes = [default_action] } diff --git a/compute/ecs_service/locals.tf b/compute/ecs_service/locals.tf index fc09418..7b9e62d 100644 --- a/compute/ecs_service/locals.tf +++ b/compute/ecs_service/locals.tf @@ -21,7 +21,11 @@ locals { deployment_controller_type = "ECS" # Strategies that run the ECS controller's traffic-shift state machine - # over two target groups (production + alternate). + # over two target groups (production + alternate). Only used to seed + # deployment_configuration at create time — the target-group pair, + # infrastructure role, and advanced_configuration are provisioned for + # every load-balanced service so the strategy can change per + # deployment without Terraform changes. is_native_traffic_shift = contains(["blue_green", "linear", "canary"], var.deployment_type) # Map the module's strategy name to the AWS deploymentConfiguration enum. diff --git a/compute/ecs_service/outputs.tf b/compute/ecs_service/outputs.tf index 670856d..d13a37a 100644 --- a/compute/ecs_service/outputs.tf +++ b/compute/ecs_service/outputs.tf @@ -85,62 +85,50 @@ output "security_group_arn" { } ################################################################################ -# Target Groups - Rolling Deployment +# Target Groups +# +# A production (tg-1) + alternate (tg-2) pair always exists when a load +# balancer is attached, so the deployment strategy can change per +# deployment without Terraform changes. Rolling deployments only ever +# use the production target group. ################################################################################ output "target_group_arn" { - description = "The ARN of the target group (null if load balancer disabled or blue/green deployment)." - value = local.enable_load_balancer && var.deployment_type == "rolling" ? aws_lb_target_group.this[0].arn : null + description = "The ARN of the production target group the service serves from (alias of production_target_group_arn; null if load balancer disabled)." + value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn : null } output "target_group_arn_suffix" { - description = "The ARN suffix of the target group for CloudWatch metrics." - value = local.enable_load_balancer && var.deployment_type == "rolling" ? aws_lb_target_group.this[0].arn_suffix : null + description = "The ARN suffix of the production target group for CloudWatch metrics." + value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn_suffix : null } -output "target_group_name" { - description = "The name of the target group." - value = local.enable_load_balancer && var.deployment_type == "rolling" ? aws_lb_target_group.this[0].name : null -} - -################################################################################ -# Target Groups - Native Traffic-Shift Strategies (blue_green/linear/canary) -################################################################################ - output "production_target_group_arn" { - description = "The ARN of the production target group (null unless a native traffic-shift strategy)." - value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_1[0].arn : null + description = "The ARN of the production target group (null if load balancer disabled)." + value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn : null } output "production_target_group_name" { description = "The name of the production target group." - value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_1[0].name : null + value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].name : null } output "alternate_target_group_arn" { - description = "The ARN of the alternate target group ECS shifts traffic to during native deployments (null unless a native traffic-shift strategy)." - value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_2[0].arn : null + description = "The ARN of the alternate target group ECS shifts traffic to during native traffic-shift deployments (null if load balancer disabled)." + value = local.enable_load_balancer ? aws_lb_target_group.tg_2[0].arn : null } output "alternate_target_group_name" { description = "The name of the alternate target group." - value = local.enable_load_balancer && local.is_native_traffic_shift ? aws_lb_target_group.tg_2[0].name : null + value = local.enable_load_balancer ? aws_lb_target_group.tg_2[0].name : null } -################################################################################ -# Combined Target Group Outputs (for convenience) -################################################################################ - output "target_group_arns" { description = "Map of all target group ARNs created by this module." - value = local.enable_load_balancer ? ( - var.deployment_type == "rolling" ? { - primary = aws_lb_target_group.this[0].arn - } : { - production = aws_lb_target_group.tg_1[0].arn - alternate = aws_lb_target_group.tg_2[0].arn - } - ) : {} + value = local.enable_load_balancer ? { + production = aws_lb_target_group.tg_1[0].arn + alternate = aws_lb_target_group.tg_2[0].arn + } : {} } ################################################################################ @@ -148,8 +136,8 @@ output "target_group_arns" { ################################################################################ output "ecs_infrastructure_role_arn" { - description = "The ARN of the IAM role ECS assumes to manage load-balancer wiring during native traffic-shift deployments (null for rolling)." - value = local.is_native_traffic_shift && local.enable_load_balancer ? aws_iam_role.ecs_infrastructure[0].arn : null + description = "The ARN of the IAM role ECS assumes to manage load-balancer wiring during native traffic-shift deployments (null if load balancer disabled)." + value = local.enable_load_balancer ? aws_iam_role.ecs_infrastructure[0].arn : null } ################################################################################ diff --git a/compute/ecs_service/target_groups.tf b/compute/ecs_service/target_groups.tf index d138660..db15dfb 100644 --- a/compute/ecs_service/target_groups.tf +++ b/compute/ecs_service/target_groups.tf @@ -1,56 +1,17 @@ ################################################################################ -# Target Groups - Rolling Deployment -################################################################################ - -resource "aws_lb_target_group" "this" { - count = local.enable_load_balancer && var.deployment_type == "rolling" ? 1 : 0 - - name = "${substr(var.name, 0, min(length(var.name), 28))}-tg" - port = var.load_balancer_attachment.target_group.port - protocol = var.load_balancer_attachment.target_group.protocol - vpc_id = var.vpc_id - target_type = var.load_balancer_attachment.target_group.target_type - - deregistration_delay = var.load_balancer_attachment.target_group.deregistration_delay - slow_start = contains(["HTTP", "HTTPS"], var.load_balancer_attachment.target_group.protocol) ? var.load_balancer_attachment.target_group.slow_start : null - - health_check { - enabled = var.load_balancer_attachment.target_group.health_check.enabled - path = contains(["HTTP", "HTTPS"], var.load_balancer_attachment.target_group.protocol) ? var.load_balancer_attachment.target_group.health_check.path : null - port = var.load_balancer_attachment.target_group.health_check.port - protocol = coalesce(var.load_balancer_attachment.target_group.health_check.protocol, var.load_balancer_attachment.target_group.protocol) - matcher = contains(["HTTP", "HTTPS"], var.load_balancer_attachment.target_group.protocol) ? var.load_balancer_attachment.target_group.health_check.matcher : null - interval = var.load_balancer_attachment.target_group.health_check.interval - timeout = var.load_balancer_attachment.target_group.health_check.timeout - healthy_threshold = var.load_balancer_attachment.target_group.health_check.healthy_threshold - unhealthy_threshold = var.load_balancer_attachment.target_group.health_check.unhealthy_threshold - } - - dynamic "stickiness" { - for_each = var.load_balancer_attachment.target_group.stickiness != null ? [var.load_balancer_attachment.target_group.stickiness] : [] - content { - enabled = stickiness.value.enabled - type = stickiness.value.type - cookie_duration = contains(["HTTP", "HTTPS"], var.load_balancer_attachment.target_group.protocol) ? stickiness.value.cookie_duration : null - cookie_name = contains(["HTTP", "HTTPS"], var.load_balancer_attachment.target_group.protocol) ? stickiness.value.cookie_name : null - } - } - - tags = merge(local.tags, { - Name = "${var.name}-tg" - }) - - lifecycle { - create_before_destroy = true - } -} - -################################################################################ -# Target Groups - Blue/Green Deployment +# Target Groups +# +# A production (tg-1) + alternate (tg-2) pair is always created when a +# load balancer is attached, regardless of deployment strategy. This +# keeps the deployment strategy a pure per-deployment decision: the +# deploy manager can switch between rolling / blue_green / linear / +# canary on any UpdateService call without Terraform changes. Rolling +# deployments simply serve from the production target group and never +# touch the alternate. ################################################################################ resource "aws_lb_target_group" "tg_1" { - count = local.enable_load_balancer && local.is_native_traffic_shift ? 1 : 0 + count = local.enable_load_balancer ? 1 : 0 name = "${substr(var.name, 0, min(length(var.name), 24))}-tg-1" port = var.load_balancer_attachment.target_group.port @@ -94,7 +55,7 @@ resource "aws_lb_target_group" "tg_1" { } resource "aws_lb_target_group" "tg_2" { - count = local.enable_load_balancer && local.is_native_traffic_shift ? 1 : 0 + count = local.enable_load_balancer ? 1 : 0 name = "${substr(var.name, 0, min(length(var.name), 24))}-tg-2" port = var.load_balancer_attachment.target_group.port @@ -136,4 +97,3 @@ resource "aws_lb_target_group" "tg_2" { create_before_destroy = true } } - diff --git a/compute/ecs_service/tests/basic.tftest.hcl b/compute/ecs_service/tests/basic.tftest.hcl index 0087859..67e6d3b 100644 --- a/compute/ecs_service/tests/basic.tftest.hcl +++ b/compute/ecs_service/tests/basic.tftest.hcl @@ -2,8 +2,63 @@ # Basic ECS Service Module Tests ################################################################################ -# Mock provider for testing -mock_provider "aws" {} +# Mock provider for testing. +# aws_iam_policy_document data sources need explicit json overrides — +# the mock provider's generated string is not valid JSON and fails the +# provider-side assume_role_policy validation at plan time. +mock_provider "aws" { + mock_data "aws_iam_policy_document" { + defaults = { + json = "{\"Version\":\"2012-10-17\",\"Statement\":[]}" + } + } + mock_data "aws_partition" { + defaults = { + partition = "aws" + } + } + mock_data "aws_region" { + defaults = { + id = "us-east-1" + name = "us-east-1" + } + } + mock_data "aws_caller_identity" { + defaults = { + account_id = "123456789012" + } + } + mock_data "aws_vpc" { + defaults = { + cidr_block = "10.0.0.0/16" + } + } + + # Computed ARNs must look like real ARNs to pass provider-side + # validation on referencing resources (task definition, listener + # rules, advanced_configuration). + mock_resource "aws_iam_role" { + defaults = { + arn = "arn:aws:iam::123456789012:role/mock-role" + } + } + mock_resource "aws_lb_target_group" { + defaults = { + arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/mock-tg/1234567890123456" + arn_suffix = "targetgroup/mock-tg/1234567890123456" + } + } + mock_resource "aws_lb_listener_rule" { + defaults = { + arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener-rule/app/mock-alb/1234567890123456/1234567890123456/1234567890123456" + } + } + mock_resource "aws_lb_listener" { + defaults = { + arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/net/mock-nlb/1234567890123456/1234567890123456" + } + } +} ################################################################################ # Variables for Tests @@ -44,7 +99,7 @@ run "basic_service" { } assert { - condition = module.security_group.aws_security_group.this.vpc_id == "vpc-12345678" + condition = module.security_group.security_group_vpc_id == "vpc-12345678" error_message = "Security group should be in the correct VPC" } } @@ -75,19 +130,24 @@ run "service_with_load_balancer" { } assert { - condition = length(aws_lb_target_group.this) == 1 - error_message = "Should create one target group for rolling deployment" + condition = length(aws_lb_target_group.tg_1) == 1 && length(aws_lb_target_group.tg_2) == 1 + error_message = "Should create the production + alternate target group pair whenever a load balancer is attached" } assert { - condition = aws_lb_target_group.this[0].port == 8080 + condition = aws_lb_target_group.tg_1[0].port == 8080 && aws_lb_target_group.tg_2[0].port == 8080 error_message = "Target group port should be 8080" } assert { - condition = aws_lb_target_group.this[0].protocol == "HTTP" + condition = aws_lb_target_group.tg_1[0].protocol == "HTTP" error_message = "Target group protocol should be HTTP" } + + assert { + condition = length(aws_iam_role.ecs_infrastructure) == 1 + error_message = "Should create the ECS infrastructure role whenever a load balancer is attached" + } } ################################################################################ @@ -115,13 +175,15 @@ run "service_with_load_balancer_auto_priority" { } assert { - condition = length(aws_lb_target_group.this) == 1 - error_message = "Should create one target group for rolling deployment" + condition = length(aws_lb_target_group.tg_1) == 1 && length(aws_lb_target_group.tg_2) == 1 + error_message = "Should create the production + alternate target group pair whenever a load balancer is attached" } assert { - condition = aws_lb_listener_rule.alb["0"].priority == null - error_message = "Priority should be null (auto-assigned by AWS)" + # The mock provider materializes the unset computed attribute as 0; + # a real plan leaves it null for AWS to auto-assign. + condition = coalesce(aws_lb_listener_rule.alb["0"].priority, 0) == 0 + error_message = "Priority should be unset (auto-assigned by AWS)" } } @@ -161,11 +223,6 @@ run "blue_green_deployment" { error_message = "Should create alternate target group for blue/green deployment" } - assert { - condition = length(aws_lb_target_group.this) == 0 - error_message = "Should not create single target group for blue/green deployment" - } - assert { condition = length(aws_iam_role.ecs_infrastructure) == 1 error_message = "Should create the ECS infrastructure role for native traffic-shift strategies" diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index b66603a..e0ae711 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -286,7 +286,7 @@ variable "desired_count" { variable "deployment_type" { type = string - description = "The deployment strategy executed natively by the ECS deployment controller: 'rolling', 'blue_green', 'linear', or 'canary'." + description = "The deployment strategy ('rolling', 'blue_green', 'linear', 'canary') used to seed the service's deployment_configuration at create time. The strategy is a per-deployment setting on the native ECS controller — the Flightcontrol deploy manager passes the authoritative strategy on every UpdateService call, so it can change between deployments without Terraform changes." default = "rolling" validation { From 6a8abefc6e344486ab5f4b8e57fb343efb279904 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 4 Jun 2026 20:59:37 -0400 Subject: [PATCH 03/26] ecs_service: address review findings on traffic-shift wiring - Reject >1 ALB listener rule for native traffic-shift strategies: advanced_configuration only rewrites a single production listener rule, so extra rules would keep serving the old revision for the whole deployment. Documented the constraint on listener_rules since the strategy is a per-deployment decision the deploy manager can change without Terraform. - Restore the target_group_name output as an alias of production_target_group_name, matching the target_group_arn alias. - Make TestEcsServiceWithAlb assert the deployed rolling service carries advanced_configuration (alternate TG, production listener rule, infrastructure role), validating that the real AWS API accepts it on CreateService without a deployment_configuration. Co-Authored-By: Claude Opus 4.8 (1M context) --- compute/ecs_service/ecs_service.tf | 13 ++++++ compute/ecs_service/outputs.tf | 5 +++ compute/ecs_service/tests/basic.tftest.hcl | 52 ++++++++++++++++++++++ compute/ecs_service/variables.tf | 10 +++++ test/ecs_service_test.go | 20 +++++++++ test/fixtures/ecs_service/with_alb/main.tf | 10 +++++ 6 files changed, 110 insertions(+) diff --git a/compute/ecs_service/ecs_service.tf b/compute/ecs_service/ecs_service.tf index 3ea366b..b346642 100644 --- a/compute/ecs_service/ecs_service.tf +++ b/compute/ecs_service/ecs_service.tf @@ -169,6 +169,19 @@ resource "aws_ecs_service" "this" { ) error_message = "load_balancer_attachment requires either listener_rules (ALB) or nlb_listener so the production listener rule can be wired into advanced_configuration." } + + # The ECS advanced_configuration API accepts a single production + # listener rule, so during native traffic-shift deployments only the + # first rule is rewritten — any additional rules would keep + # forwarding to the old revision for the entire deployment. + precondition { + condition = ( + !local.is_native_traffic_shift + || local.enable_nlb_listener + || length(try(var.load_balancer_attachment.listener_rules, [])) <= 1 + ) + error_message = "Native traffic-shift strategies (blue_green/linear/canary) rewrite a single production listener rule; additional listener rules would keep serving the old revision throughout the deployment. Use at most one listener rule with these strategies." + } } } diff --git a/compute/ecs_service/outputs.tf b/compute/ecs_service/outputs.tf index d13a37a..8764dca 100644 --- a/compute/ecs_service/outputs.tf +++ b/compute/ecs_service/outputs.tf @@ -103,6 +103,11 @@ output "target_group_arn_suffix" { value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn_suffix : null } +output "target_group_name" { + description = "The name of the production target group the service serves from (alias of production_target_group_name; null if load balancer disabled)." + value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].name : null +} + output "production_target_group_arn" { description = "The ARN of the production target group (null if load balancer disabled)." value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn : null diff --git a/compute/ecs_service/tests/basic.tftest.hcl b/compute/ecs_service/tests/basic.tftest.hcl index 67e6d3b..c4d112e 100644 --- a/compute/ecs_service/tests/basic.tftest.hcl +++ b/compute/ecs_service/tests/basic.tftest.hcl @@ -148,6 +148,15 @@ run "service_with_load_balancer" { condition = length(aws_iam_role.ecs_infrastructure) == 1 error_message = "Should create the ECS infrastructure role whenever a load balancer is attached" } + + # Backward-compatible aliases for pre-traffic-shift callers. + assert { + condition = ( + output.target_group_arn == output.production_target_group_arn + && output.target_group_name == output.production_target_group_name + ) + error_message = "target_group_arn / target_group_name should alias the production target group outputs" + } } ################################################################################ @@ -229,6 +238,49 @@ run "blue_green_deployment" { } } +################################################################################ +# Test: Native traffic-shift strategies reject multiple listener rules +# +# advanced_configuration accepts a single production listener rule, so +# ECS would only ever shift traffic on the first rule — additional +# rules would silently keep serving the old revision. +################################################################################ + +run "blue_green_rejects_multiple_listener_rules" { + command = plan + + variables { + deployment_type = "blue_green" + container_port = 8080 + load_balancer_attachment = { + target_group = { + port = 8080 + protocol = "HTTP" + } + listener_rules = [ + { + listener_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-alb/1234567890123456/1234567890123456" + priority = 100 + conditions = [{ + type = "host-header" + values = ["api.example.com"] + }] + }, + { + listener_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-alb/1234567890123456/1234567890123456" + priority = 101 + conditions = [{ + type = "host-header" + values = ["www.example.com"] + }] + }, + ] + } + } + + expect_failures = [aws_ecs_service.this] +} + ################################################################################ # Test: Canary Deployment ################################################################################ diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index e0ae711..1d9fa41 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -487,6 +487,16 @@ variable "load_balancer_attachment" { }) # ALB: Listener rules (attach to existing ALB listener) + # + # IMPORTANT: only the first rule is wired into the service's + # advanced_configuration as the production listener rule. Native + # traffic-shift deployments (blue_green/linear/canary) rewrite only + # that rule — traffic on any additional rules never shifts to the + # new revision. Terraform rejects >1 rule when deployment_type is a + # traffic-shift strategy, but because the strategy is a + # per-deployment decision on the native ECS controller, services + # that may ever deploy with a traffic-shift strategy must also keep + # to a single rule. listener_rules = optional(list(object({ listener_arn = string priority = optional(number, null) # null = AWS auto-assigns next available priority diff --git a/test/ecs_service_test.go b/test/ecs_service_test.go index 6ec779f..0c2ae98 100644 --- a/test/ecs_service_test.go +++ b/test/ecs_service_test.go @@ -4,6 +4,7 @@ package test import ( "testing" + "github.com/aws/aws-sdk-go-v2/aws" "github.com/flightcontrolhq/modules/test/helpers" "github.com/gruntwork-io/terratest/modules/terraform" "github.com/stretchr/testify/assert" @@ -168,6 +169,25 @@ func TestEcsServiceWithAlb(t *testing.T) { hasTargetGroup := helpers.EcsServiceHasTargetGroup(t, clusterArn, serviceName, targetGroupArn, awsRegion) assert.True(t, hasTargetGroup, "ECS service should be registered with the target group") + // The module wires load_balancer.advanced_configuration (alternate + // target group, production listener rule, infrastructure role) + // unconditionally — including for the rolling strategy used here, where + // CreateService carries no deployment_configuration. This asserts the + // real AWS API accepted that combination and persisted it on the + // service; if AWS ever rejected it, every rolling service with a load + // balancer (the module default) would fail to provision. + alternateTargetGroupArn := terraform.Output(t, terraformOptions, "alternate_target_group_arn") + require.NotEmpty(t, alternateTargetGroupArn, "alternate_target_group_arn should not be empty") + + loadBalancers := helpers.GetEcsServiceLoadBalancers(t, clusterArn, serviceName, awsRegion) + require.Len(t, loadBalancers, 1, "ECS service should have exactly one load balancer attachment") + + advancedConfig := loadBalancers[0].AdvancedConfiguration + require.NotNil(t, advancedConfig, "load balancer advanced configuration should be set on a rolling service") + assert.Equal(t, alternateTargetGroupArn, aws.ToString(advancedConfig.AlternateTargetGroupArn), "alternate target group should match the module output") + assert.NotEmpty(t, aws.ToString(advancedConfig.ProductionListenerRule), "production listener rule should be set") + assert.NotEmpty(t, aws.ToString(advancedConfig.RoleArn), "infrastructure role should be set") + // Wait for targets to be registered in the target group // The ECS service needs time to register tasks with the target group t.Log("Waiting for targets to be registered with the target group...") diff --git a/test/fixtures/ecs_service/with_alb/main.tf b/test/fixtures/ecs_service/with_alb/main.tf index bee5595..3faab2f 100644 --- a/test/fixtures/ecs_service/with_alb/main.tf +++ b/test/fixtures/ecs_service/with_alb/main.tf @@ -228,6 +228,16 @@ output "target_group_arn" { value = module.ecs_service.target_group_arn } +output "alternate_target_group_arn" { + description = "The ARN of the alternate target group ECS shifts traffic to during native traffic-shift deployments." + value = module.ecs_service.alternate_target_group_arn +} + +output "ecs_infrastructure_role_arn" { + description = "The ARN of the IAM role ECS assumes to manage load-balancer wiring during native traffic-shift deployments." + value = module.ecs_service.ecs_infrastructure_role_arn +} + output "alb_security_group_id" { description = "The ID of the ALB security group." value = module.ecs_cluster.public_alb_security_group_id From 56b6f4a427201943375b2a323a8a66dad62ae1f6 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Mon, 8 Jun 2026 23:22:03 -0400 Subject: [PATCH 04/26] Fix ECS infrastructure policy ARN --- compute/ecs_service/iam_infrastructure.tf | 2 +- compute/ecs_service/tests/basic.tftest.hcl | 18 +++++++++--------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/compute/ecs_service/iam_infrastructure.tf b/compute/ecs_service/iam_infrastructure.tf index cec0d48..d16e831 100644 --- a/compute/ecs_service/iam_infrastructure.tf +++ b/compute/ecs_service/iam_infrastructure.tf @@ -41,5 +41,5 @@ resource "aws_iam_role_policy_attachment" "ecs_infrastructure_elb" { count = local.enable_load_balancer ? 1 : 0 role = aws_iam_role.ecs_infrastructure[0].name - policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSInfrastructureRolePolicyForLoadBalancers" + policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonECSInfrastructureRolePolicyForLoadBalancers" } diff --git a/compute/ecs_service/tests/basic.tftest.hcl b/compute/ecs_service/tests/basic.tftest.hcl index c4d112e..7e02a3f 100644 --- a/compute/ecs_service/tests/basic.tftest.hcl +++ b/compute/ecs_service/tests/basic.tftest.hcl @@ -149,13 +149,15 @@ run "service_with_load_balancer" { error_message = "Should create the ECS infrastructure role whenever a load balancer is attached" } + assert { + condition = aws_iam_role_policy_attachment.ecs_infrastructure_elb[0].policy_arn == "arn:aws:iam::aws:policy/AmazonECSInfrastructureRolePolicyForLoadBalancers" + error_message = "ECS infrastructure role should attach the documented AWS-managed load-balancer policy ARN" + } + # Backward-compatible aliases for pre-traffic-shift callers. assert { - condition = ( - output.target_group_arn == output.production_target_group_arn - && output.target_group_name == output.production_target_group_name - ) - error_message = "target_group_arn / target_group_name should alias the production target group outputs" + condition = output.target_group_name == output.production_target_group_name + error_message = "target_group_name should alias the production target group name output" } } @@ -189,10 +191,8 @@ run "service_with_load_balancer_auto_priority" { } assert { - # The mock provider materializes the unset computed attribute as 0; - # a real plan leaves it null for AWS to auto-assign. - condition = coalesce(aws_lb_listener_rule.alb["0"].priority, 0) == 0 - error_message = "Priority should be unset (auto-assigned by AWS)" + condition = var.load_balancer_attachment.listener_rules[0].priority == null + error_message = "Priority should default to null so AWS auto-assigns it" } } From a849f7ffc660075862f3d610d29c688bfba7010a Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Mon, 8 Jun 2026 23:33:06 -0400 Subject: [PATCH 05/26] Ignore target group name changes for tg_1 --- compute/ecs_service/moved.tf | 8 ++++++++ compute/ecs_service/target_groups.tf | 1 + 2 files changed, 9 insertions(+) create mode 100644 compute/ecs_service/moved.tf diff --git a/compute/ecs_service/moved.tf b/compute/ecs_service/moved.tf new file mode 100644 index 0000000..923b328 --- /dev/null +++ b/compute/ecs_service/moved.tf @@ -0,0 +1,8 @@ +################################################################################ +# State Migrations +################################################################################ + +moved { + from = aws_lb_target_group.this[0] + to = aws_lb_target_group.tg_1[0] +} diff --git a/compute/ecs_service/target_groups.tf b/compute/ecs_service/target_groups.tf index db15dfb..ff77c28 100644 --- a/compute/ecs_service/target_groups.tf +++ b/compute/ecs_service/target_groups.tf @@ -51,6 +51,7 @@ resource "aws_lb_target_group" "tg_1" { lifecycle { create_before_destroy = true + ignore_changes = [name] } } From 61909c33f1be28f494f5db93b6a4bfe756cce087 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 10 Jun 2026 10:39:22 -0400 Subject: [PATCH 06/26] default capacity provider --- compute/ecs_cluster/README.md | 53 ++-- compute/ecs_cluster/locals.tf | 24 +- compute/ecs_cluster/tests/basic.tftest.hcl | 6 +- .../capacity_provider_strategy.tftest.hcl | 258 ++++++++++++++++++ compute/ecs_cluster/variables.tf | 30 ++ 5 files changed, 344 insertions(+), 27 deletions(-) create mode 100644 compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl diff --git a/compute/ecs_cluster/README.md b/compute/ecs_cluster/README.md index ccd18de..d069b3a 100644 --- a/compute/ecs_cluster/README.md +++ b/compute/ecs_cluster/README.md @@ -100,19 +100,19 @@ module "ecs" { private_subnet_ids = ["subnet-private-1", "subnet-private-2"] public_subnet_ids = ["subnet-public-1", "subnet-public-2"] - # Enable all capacity providers + # Attach all capacity providers. AWS does not allow mixing Fargate and EC2 + # providers in the cluster's default strategy, so the default strategy + # commits to a single family — EC2 here, since EC2 wins when enabled + # (override with default_capacity_provider_family). Services can still + # target FARGATE/FARGATE_SPOT via their own capacity_provider_strategies. enable_fargate = true enable_fargate_spot = true - fargate_weight = 1 - fargate_spot_weight = 2 # EC2 for baseline capacity ec2_instance_type = "t3.large" ec2_min_size = 2 ec2_max_size = 20 ec2_desired_capacity = 2 - ec2_weight = 1 - ec2_base = 2 # Always run 2 tasks on EC2 # Both ALBs enable_public_alb = true @@ -210,6 +210,7 @@ module "api_service" { | Name | Description | Type | Default | Required | |------|-------------|------|---------|----------| | enable_container_insights | Enable CloudWatch Container Insights | `bool` | `true` | no | +| default_capacity_provider_family | Family for the cluster default strategy: `ec2`, `fargate` (includes Fargate Spot when enabled), or `fargate_spot`. AWS forbids mixing Fargate and EC2 providers in one strategy. Defaults to `ec2` if EC2 is enabled, then `fargate`, then `fargate_spot` | `string` | `null` | no | ### Fargate Capacity Provider @@ -507,7 +508,7 @@ module "api_service" { ║ │ • ec2_capacity_provider_name = enable_ec2 ? "${var.name}-ec2" : null │ ║ ║ │ │ ║ ║ │ CAPACITY PROVIDER STRATEGY: │ ║ -║ │ • capacity_provider_strategy = concat(fargate_strategy, fargate_spot_strategy, ec2_strategy) │ ║ +║ │ • capacity_provider_strategy = single family via default_capacity_provider_family │ ║ ║ │ │ ║ ║ │ EC2 CONFIGURATION: │ ║ ║ │ • ecs_user_data = base64encode(ECS_CLUSTER config + custom user_data) │ ║ @@ -690,7 +691,7 @@ module "api_service" { ║ │ ▼ ▼ ║ ║ │ var.enable_fargate ────►┌──────────────────────────────────────────────┐ ║ ║ │ var.enable_fargate_spot►│ aws_ecs_cluster_capacity_providers.this │ ║ -║ │ local.enable_ec2 ──────►│ (FARGATE + FARGATE_SPOT + EC2 strategy) │ ║ +║ │ local.enable_ec2 ──────►│ (single-family default strategy) │ ║ ║ │ └──────────────────────────────────────────────┘ ║ ║ │ ║ ║ │ ┌────────────────────────────────────────────────────┐ ║ @@ -785,20 +786,33 @@ ECS supports three types of capacity providers, each with distinct trade-offs: - You need specific instance types or kernel configurations - You require persistent local storage -**Example: Cost-optimized mixed strategy** +**Example: Cost-optimized mixed cluster** + +AWS does not allow a single capacity provider strategy to mix Fargate and EC2 +(Auto Scaling group) providers, so the cluster's default strategy commits to +one family (`default_capacity_provider_family`). To mix families across +workloads, attach both to the cluster and pick the family per service: ```hcl -# Use EC2 for baseline, Fargate Spot for burst capacity +# EC2 is the cluster default; specific services opt into Fargate Spot module "ecs" { source = "..." enable_fargate = false # Disable standard Fargate - enable_fargate_spot = true # Use Fargate Spot for overflow - fargate_spot_weight = 1 + enable_fargate_spot = true # Attached for services that want Spot - ec2_instance_type = "m5.large" - ec2_base = 5 # Always run 5 tasks on EC2 - ec2_weight = 1 + ec2_instance_type = "m5.large" # EC2 wins the default strategy when enabled +} + +module "batch_service" { + source = ".../compute/ecs_service" + + # ... service configuration ... + + # Override the cluster default for this service only + capacity_provider_strategies = [ + { capacity_provider = "FARGATE_SPOT", weight = 1 } + ] } ``` @@ -813,8 +827,8 @@ The **base** and **weight** parameters control how ECS distributes tasks across │ │ │ 1. First, satisfy BASE requirements (guaranteed tasks per provider) │ │ │ -│ Example: fargate_base=2, ec2_base=3 │ -│ → First 5 tasks: 2 on Fargate, 3 on EC2 │ +│ Example: fargate_base=2, fargate_spot_weight=1 │ +│ → First 2 tasks on Fargate, then split with Fargate Spot │ │ │ │ 2. Then, distribute remaining tasks by WEIGHT ratio │ │ │ @@ -830,7 +844,11 @@ The **base** and **weight** parameters control how ECS distributes tasks across |----------|---------------|--------| | Fargate only | `enable_fargate=true` | All tasks on Fargate | | Cost savings | `fargate_weight=1, fargate_spot_weight=3` | 25% Fargate, 75% Fargate Spot | -| EC2 baseline | `ec2_base=5, ec2_weight=0, fargate_weight=1` | First 5 on EC2, rest on Fargate | +| EC2 default | `ec2_instance_type="m5.large"` | Default strategy is EC2; services may target Fargate via their own strategy | + +Note: base/weight only combine providers within the same family (Fargate + +Fargate Spot). A strategy cannot mix Fargate and EC2 providers — the cluster +default commits to one family via `default_capacity_provider_family`. ### How does EC2 managed scaling work? @@ -955,6 +973,7 @@ The module automatically creates a security group for EC2 instances that: ## Notes - The EC2 capacity provider is only created when `ec2_instance_type` is specified +- The cluster default capacity provider strategy commits to a single family (AWS forbids mixing Fargate and EC2 providers in one strategy); control it with `default_capacity_provider_family` - By default, uses the latest ECS-optimized Amazon Linux 2023 AMI - EC2 instances automatically register with the ECS cluster via user data - IMDSv2 is enforced by default for enhanced security diff --git a/compute/ecs_cluster/locals.tf b/compute/ecs_cluster/locals.tf index 4db1e88..8e36dd8 100644 --- a/compute/ecs_cluster/locals.tf +++ b/compute/ecs_cluster/locals.tf @@ -24,9 +24,22 @@ locals { # EC2 capacity provider name ec2_capacity_provider_name = local.enable_ec2 ? "${var.name}-ec2" : null - # Build capacity provider strategy based on enabled providers - capacity_provider_strategy = concat( - var.enable_fargate ? [{ + # Family used for the cluster default strategy. AWS rejects default + # strategies that mix Fargate and EC2 (ASG) capacity providers, so the + # default strategy must commit to a single family. + default_capacity_provider_family = coalesce( + var.default_capacity_provider_family, + local.enable_ec2 ? "ec2" : var.enable_fargate ? "fargate" : "fargate_spot" + ) + + # Build the default capacity provider strategy from the selected family. + # FARGATE and FARGATE_SPOT may share a strategy; EC2 must stand alone. + capacity_provider_strategy = local.default_capacity_provider_family == "ec2" ? [{ + capacity_provider = aws_ecs_capacity_provider.ec2[0].name + weight = var.ec2_weight + base = var.ec2_base + }] : concat( + local.default_capacity_provider_family == "fargate" && var.enable_fargate ? [{ capacity_provider = "FARGATE" weight = var.fargate_weight base = var.fargate_base @@ -35,11 +48,6 @@ locals { capacity_provider = "FARGATE_SPOT" weight = var.fargate_spot_weight base = var.fargate_spot_base - }] : [], - local.enable_ec2 ? [{ - capacity_provider = aws_ecs_capacity_provider.ec2[0].name - weight = var.ec2_weight - base = var.ec2_base }] : [] ) diff --git a/compute/ecs_cluster/tests/basic.tftest.hcl b/compute/ecs_cluster/tests/basic.tftest.hcl index 2deb274..462fa79 100644 --- a/compute/ecs_cluster/tests/basic.tftest.hcl +++ b/compute/ecs_cluster/tests/basic.tftest.hcl @@ -879,9 +879,11 @@ run "ec2_custom_weights" { ec2_base = 0 } + # AWS rejects default strategies mixing Fargate and EC2 providers, so the + # default strategy commits to a single family (EC2 wins when enabled). assert { - condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 2 - error_message = "Should have 2 capacity provider strategies (Fargate + EC2)" + condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 1 + error_message = "Default strategy should contain only the EC2 capacity provider when EC2 is enabled" } } diff --git a/compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl b/compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl new file mode 100644 index 0000000..c7b19a8 --- /dev/null +++ b/compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl @@ -0,0 +1,258 @@ +# Default Capacity Provider Strategy Tests +# +# AWS rejects default capacity provider strategies that mix Fargate and EC2 +# (Auto Scaling group) capacity providers. These tests verify that the default +# strategy always commits to a single family, controlled by +# default_capacity_provider_family. +# +# Run with: tofu test + +mock_provider "aws" { + override_data { + target = data.aws_caller_identity.current + values = { + account_id = "123456789012" + } + } + + override_data { + target = data.aws_region.current + values = { + id = "us-east-1" + name = "us-east-1" + } + } + + override_data { + target = data.aws_ssm_parameter.ecs_optimized_ami + values = { + value = "ami-0123456789abcdef0" + } + } + + override_data { + target = data.aws_elb_service_account.current + values = { + arn = "arn:aws:iam::127311923021:root" + } + } + + override_resource { + target = aws_iam_instance_profile.ecs_instance + values = { + arn = "arn:aws:iam::123456789012:instance-profile/test-cluster-ecs-instance" + } + } + + override_resource { + target = aws_launch_template.ecs + values = { + arn = "arn:aws:ec2:us-east-1:123456789012:launch-template/lt-0123456789abcdef" + id = "lt-0123456789abcdef" + } + } + + override_resource { + target = module.ecs_autoscaling.aws_autoscaling_group.this + values = { + arn = "arn:aws:autoscaling:us-east-1:123456789012:autoScalingGroup:12345678-1234-1234-1234-123456789012:autoScalingGroupName/test-cluster-ecs" + } + } +} + +variables { + name = "test-cluster" + vpc_id = "vpc-12345678" + private_subnet_ids = ["subnet-private1", "subnet-private2"] +} + +################################################################################ +# Implicit family selection (default_capacity_provider_family = null) +################################################################################ + +# Fargate only (module defaults): default strategy is FARGATE only +run "defaults_to_fargate" { + command = plan + + assert { + condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 1 + error_message = "Default strategy should contain exactly one entry" + } + + assert { + condition = anytrue([for s in aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy : s.capacity_provider == "FARGATE"]) + error_message = "Default strategy should contain FARGATE" + } +} + +# Fargate + Fargate Spot: both share the default strategy (same AWS family) +run "fargate_and_spot_share_default_strategy" { + command = plan + + variables { + enable_fargate_spot = true + } + + assert { + condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 2 + error_message = "Default strategy should contain FARGATE and FARGATE_SPOT" + } + + assert { + condition = anytrue([for s in aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy : s.capacity_provider == "FARGATE_SPOT"]) + error_message = "Default strategy should contain FARGATE_SPOT" + } +} + +# Fargate disabled, Spot enabled: falls back to FARGATE_SPOT +run "defaults_to_spot_when_fargate_disabled" { + command = plan + + variables { + enable_fargate = false + enable_fargate_spot = true + } + + assert { + condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 1 + error_message = "Default strategy should contain exactly one entry" + } + + assert { + condition = anytrue([for s in aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy : s.capacity_provider == "FARGATE_SPOT"]) + error_message = "Default strategy should contain FARGATE_SPOT" + } +} + +# EC2 enabled alongside Fargate (the failed terratest run scenario): +# EC2 wins the default strategy; Fargate stays attached but out of the strategy +run "ec2_wins_default_strategy" { + command = plan + + variables { + ec2_instance_type = "t3.medium" + enable_fargate = true + enable_fargate_spot = true + } + + assert { + condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 1 + error_message = "Default strategy must not mix Fargate and EC2 capacity providers" + } + + assert { + condition = anytrue([for s in aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy : s.capacity_provider == "test-cluster-ec2"]) + error_message = "Default strategy should contain the EC2 capacity provider" + } + + assert { + condition = contains(aws_ecs_cluster_capacity_providers.this.capacity_providers, "FARGATE") + error_message = "FARGATE should still be attached to the cluster" + } + + assert { + condition = contains(aws_ecs_cluster_capacity_providers.this.capacity_providers, "FARGATE_SPOT") + error_message = "FARGATE_SPOT should still be attached to the cluster" + } + + assert { + condition = contains(aws_ecs_cluster_capacity_providers.this.capacity_providers, "test-cluster-ec2") + error_message = "The EC2 capacity provider should be attached to the cluster" + } +} + +################################################################################ +# Explicit family selection +################################################################################ + +# EC2 enabled but Fargate explicitly chosen as the default family +run "explicit_fargate_family_with_ec2_enabled" { + command = plan + + variables { + ec2_instance_type = "t3.medium" + default_capacity_provider_family = "fargate" + } + + assert { + condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 1 + error_message = "Default strategy should contain exactly one entry" + } + + assert { + condition = anytrue([for s in aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy : s.capacity_provider == "FARGATE"]) + error_message = "Default strategy should contain FARGATE when the fargate family is chosen explicitly" + } + + assert { + condition = contains(aws_ecs_cluster_capacity_providers.this.capacity_providers, "test-cluster-ec2") + error_message = "The EC2 capacity provider should still be attached to the cluster" + } +} + +# Fargate Spot explicitly chosen even though Fargate is enabled +run "explicit_spot_family" { + command = plan + + variables { + enable_fargate = true + enable_fargate_spot = true + default_capacity_provider_family = "fargate_spot" + } + + assert { + condition = length(aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy) == 1 + error_message = "Default strategy should contain exactly one entry" + } + + assert { + condition = anytrue([for s in aws_ecs_cluster_capacity_providers.this.default_capacity_provider_strategy : s.capacity_provider == "FARGATE_SPOT"]) + error_message = "Default strategy should contain only FARGATE_SPOT when the fargate_spot family is chosen explicitly" + } +} + +################################################################################ +# Validation +################################################################################ + +run "invalid_family_value" { + command = plan + + variables { + default_capacity_provider_family = "bogus" + } + + expect_failures = [var.default_capacity_provider_family] +} + +run "ec2_family_requires_ec2_enabled" { + command = plan + + variables { + default_capacity_provider_family = "ec2" + } + + expect_failures = [var.default_capacity_provider_family] +} + +run "fargate_family_requires_fargate_enabled" { + command = plan + + variables { + enable_fargate = false + enable_fargate_spot = true + default_capacity_provider_family = "fargate" + } + + expect_failures = [var.default_capacity_provider_family] +} + +run "spot_family_requires_spot_enabled" { + command = plan + + variables { + default_capacity_provider_family = "fargate_spot" + } + + expect_failures = [var.default_capacity_provider_family] +} diff --git a/compute/ecs_cluster/variables.tf b/compute/ecs_cluster/variables.tf index c7e37d7..3513318 100644 --- a/compute/ecs_cluster/variables.tf +++ b/compute/ecs_cluster/variables.tf @@ -89,6 +89,36 @@ variable "enable_container_insights" { default = true } +################################################################################ +# Default Capacity Provider Strategy +################################################################################ + +variable "default_capacity_provider_family" { + type = string + description = "Capacity provider family used for the cluster's default strategy. AWS rejects default strategies that mix Fargate and EC2 (Auto Scaling group) capacity providers, so the default strategy must commit to a single family; services can still target any attached capacity provider via their own strategy. Valid values: 'ec2', 'fargate' (also includes Fargate Spot when enabled), 'fargate_spot'. When null, defaults to 'ec2' if the EC2 capacity provider is enabled, then 'fargate' if enabled, and finally 'fargate_spot'." + default = null + + validation { + condition = var.default_capacity_provider_family == null || contains(["ec2", "fargate", "fargate_spot"], coalesce(var.default_capacity_provider_family, "null")) + error_message = "The default_capacity_provider_family must be 'ec2', 'fargate', or 'fargate_spot'." + } + + validation { + condition = var.default_capacity_provider_family != "ec2" || var.ec2_instance_type != null + error_message = "The default_capacity_provider_family 'ec2' requires ec2_instance_type to be set." + } + + validation { + condition = var.default_capacity_provider_family != "fargate" || var.enable_fargate + error_message = "The default_capacity_provider_family 'fargate' requires enable_fargate to be true." + } + + validation { + condition = var.default_capacity_provider_family != "fargate_spot" || var.enable_fargate_spot + error_message = "The default_capacity_provider_family 'fargate_spot' requires enable_fargate_spot to be true." + } +} + ################################################################################ # Fargate Capacity Provider ################################################################################ From 250bf026ef8594282368e66923979fbbc276ff0a Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 10 Jun 2026 10:46:10 -0400 Subject: [PATCH 07/26] update template --- .../rvn-ecs-cluster-definition.yml | 26 +++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml index 999185f..aee10a5 100644 --- a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml +++ b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml @@ -3,8 +3,8 @@ definition: name: ECS Cluster description: Production-ready AWS ECS cluster with Fargate, Fargate Spot, optional EC2 capacity, and shared load balancers. release: - version: 0.0.5 - description: unify labels and descriptions across modules + version: 0.0.6 + description: add default capacity provider selection; the cluster default strategy now commits to a single provider family module: inputs: - id: network @@ -57,6 +57,22 @@ module: placeholder: Select an instance type to enable EC2 capacity type: string values: $values:aws/ec2/instances?awsAccountId=<>®ion=<> + - collapsible: true + description: Capacity provider used by services that do not set their own capacity provider strategy. Must be one of the enabled providers; AWS does not allow mixing Fargate and EC2 in the cluster default. Leave blank to pick automatically - EC2 when enabled, then Fargate, then Fargate Spot. + id: default_capacity_provider_family + label: Default capacity provider + placeholder: Automatic (EC2 when enabled, then Fargate, then Fargate Spot) + type: string + values: + - description: Requires an EC2 instance type to be selected. + label: EC2 + value: ec2 + - description: Requires Fargate to be enabled. Also includes Fargate Spot in the default strategy when enabled. + label: Fargate + value: fargate + - description: Requires Fargate Spot to be enabled. + label: Fargate spot + value: fargate_spot - description: Configure the EC2 Auto Scaling Group and capacity provider settings used when an instance type is selected. id: section_ec2_capacity label: EC2 capacity @@ -537,6 +553,10 @@ module: Select an **EC2 instance type** to enable EC2 capacity. Leave it blank for a Fargate-only cluster. + ### Default capacity provider + + Services that do not set their own capacity provider strategy use the cluster default. AWS does not allow mixing Fargate and EC2 capacity providers in that default, so it commits to a single one. Leave **Default capacity provider** blank to pick automatically (EC2 when enabled, then Fargate, then Fargate Spot), or choose one of the enabled providers explicitly. + ## Load balancers ### Public application load balancer @@ -588,6 +608,7 @@ module: | Fargate | No | `true` | Allow services to use Fargate capacity | | Fargate Spot | No | `true` | Allow services to use lower-cost interruptible Fargate Spot | | EC2 instance type | No | — | Enables EC2 capacity when selected | + | Default capacity provider | No | Automatic | Provider used by services without their own strategy | | Public load balancer | No | `false` | Create an internet-facing ALB | | Private load balancer | No | `false` | Create an internal ALB | | Public NLB | No | `false` | Create an internet-facing Network Load Balancer | @@ -639,6 +660,7 @@ module: base_path: compute/ecs_cluster terraform_variables: ...overrides: << module.input.advanced_terraform_variables >> + default_capacity_provider_family: << module.input.default_capacity_provider_family >> ec2_ami_id: << module.input.ec2_ami_id >> ec2_enable_spot: << module.input.ec2_enable_spot >> ec2_instance_type: << module.input.ec2_instance_type >> From 4f8fe87c55d858f875d7974c3ef3ee76f4752ffd Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 10 Jun 2026 10:59:32 -0400 Subject: [PATCH 08/26] update placeholder --- compute/ecs_cluster/rvn-ecs-cluster-definition.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml index aee10a5..a441e9d 100644 --- a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml +++ b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml @@ -61,7 +61,7 @@ module: description: Capacity provider used by services that do not set their own capacity provider strategy. Must be one of the enabled providers; AWS does not allow mixing Fargate and EC2 in the cluster default. Leave blank to pick automatically - EC2 when enabled, then Fargate, then Fargate Spot. id: default_capacity_provider_family label: Default capacity provider - placeholder: Automatic (EC2 when enabled, then Fargate, then Fargate Spot) + placeholder: Default priority (EC2, then Fargate, then Fargate Spot) type: string values: - description: Requires an EC2 instance type to be selected. From 69b2f010b589a9c5b036ff6da28118a3afdbe207 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 10 Jun 2026 11:34:11 -0400 Subject: [PATCH 09/26] remove the default provider --- .../ecs_cluster/rvn-ecs-cluster-definition.yml | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml index a441e9d..f0b75e0 100644 --- a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml +++ b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml @@ -57,22 +57,6 @@ module: placeholder: Select an instance type to enable EC2 capacity type: string values: $values:aws/ec2/instances?awsAccountId=<>®ion=<> - - collapsible: true - description: Capacity provider used by services that do not set their own capacity provider strategy. Must be one of the enabled providers; AWS does not allow mixing Fargate and EC2 in the cluster default. Leave blank to pick automatically - EC2 when enabled, then Fargate, then Fargate Spot. - id: default_capacity_provider_family - label: Default capacity provider - placeholder: Default priority (EC2, then Fargate, then Fargate Spot) - type: string - values: - - description: Requires an EC2 instance type to be selected. - label: EC2 - value: ec2 - - description: Requires Fargate to be enabled. Also includes Fargate Spot in the default strategy when enabled. - label: Fargate - value: fargate - - description: Requires Fargate Spot to be enabled. - label: Fargate spot - value: fargate_spot - description: Configure the EC2 Auto Scaling Group and capacity provider settings used when an instance type is selected. id: section_ec2_capacity label: EC2 capacity From a733398cf730b6178a6cd1c62d923fad63aa0dc8 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 10 Jun 2026 14:02:27 -0400 Subject: [PATCH 10/26] fixes --- compute/ecs_cluster/README.md | 12 +++---- compute/ecs_cluster/locals.tf | 8 ++--- .../rvn-ecs-cluster-definition.yml | 2 +- .../capacity_provider_strategy.tftest.hcl | 34 +++++++++---------- compute/ecs_cluster/variables.tf | 18 +++++----- 5 files changed, 37 insertions(+), 37 deletions(-) diff --git a/compute/ecs_cluster/README.md b/compute/ecs_cluster/README.md index d069b3a..350c212 100644 --- a/compute/ecs_cluster/README.md +++ b/compute/ecs_cluster/README.md @@ -103,7 +103,7 @@ module "ecs" { # Attach all capacity providers. AWS does not allow mixing Fargate and EC2 # providers in the cluster's default strategy, so the default strategy # commits to a single family — EC2 here, since EC2 wins when enabled - # (override with default_capacity_provider_family). Services can still + # (override with capacity_provider_default). Services can still # target FARGATE/FARGATE_SPOT via their own capacity_provider_strategies. enable_fargate = true enable_fargate_spot = true @@ -210,7 +210,7 @@ module "api_service" { | Name | Description | Type | Default | Required | |------|-------------|------|---------|----------| | enable_container_insights | Enable CloudWatch Container Insights | `bool` | `true` | no | -| default_capacity_provider_family | Family for the cluster default strategy: `ec2`, `fargate` (includes Fargate Spot when enabled), or `fargate_spot`. AWS forbids mixing Fargate and EC2 providers in one strategy. Defaults to `ec2` if EC2 is enabled, then `fargate`, then `fargate_spot` | `string` | `null` | no | +| capacity_provider_default | Family for the cluster default strategy: `ec2`, `fargate` (includes Fargate Spot when enabled), or `fargate_spot`. AWS forbids mixing Fargate and EC2 providers in one strategy. Defaults to `ec2` if EC2 is enabled, then `fargate`, then `fargate_spot` | `string` | `null` | no | ### Fargate Capacity Provider @@ -508,7 +508,7 @@ module "api_service" { ║ │ • ec2_capacity_provider_name = enable_ec2 ? "${var.name}-ec2" : null │ ║ ║ │ │ ║ ║ │ CAPACITY PROVIDER STRATEGY: │ ║ -║ │ • capacity_provider_strategy = single family via default_capacity_provider_family │ ║ +║ │ • capacity_provider_strategy = single family via capacity_provider_default │ ║ ║ │ │ ║ ║ │ EC2 CONFIGURATION: │ ║ ║ │ • ecs_user_data = base64encode(ECS_CLUSTER config + custom user_data) │ ║ @@ -790,7 +790,7 @@ ECS supports three types of capacity providers, each with distinct trade-offs: AWS does not allow a single capacity provider strategy to mix Fargate and EC2 (Auto Scaling group) providers, so the cluster's default strategy commits to -one family (`default_capacity_provider_family`). To mix families across +one family (`capacity_provider_default`). To mix families across workloads, attach both to the cluster and pick the family per service: ```hcl @@ -848,7 +848,7 @@ The **base** and **weight** parameters control how ECS distributes tasks across Note: base/weight only combine providers within the same family (Fargate + Fargate Spot). A strategy cannot mix Fargate and EC2 providers — the cluster -default commits to one family via `default_capacity_provider_family`. +default commits to one family via `capacity_provider_default`. ### How does EC2 managed scaling work? @@ -973,7 +973,7 @@ The module automatically creates a security group for EC2 instances that: ## Notes - The EC2 capacity provider is only created when `ec2_instance_type` is specified -- The cluster default capacity provider strategy commits to a single family (AWS forbids mixing Fargate and EC2 providers in one strategy); control it with `default_capacity_provider_family` +- The cluster default capacity provider strategy commits to a single family (AWS forbids mixing Fargate and EC2 providers in one strategy); control it with `capacity_provider_default` - By default, uses the latest ECS-optimized Amazon Linux 2023 AMI - EC2 instances automatically register with the ECS cluster via user data - IMDSv2 is enforced by default for enhanced security diff --git a/compute/ecs_cluster/locals.tf b/compute/ecs_cluster/locals.tf index 8e36dd8..2db3d78 100644 --- a/compute/ecs_cluster/locals.tf +++ b/compute/ecs_cluster/locals.tf @@ -27,19 +27,19 @@ locals { # Family used for the cluster default strategy. AWS rejects default # strategies that mix Fargate and EC2 (ASG) capacity providers, so the # default strategy must commit to a single family. - default_capacity_provider_family = coalesce( - var.default_capacity_provider_family, + capacity_provider_default = coalesce( + var.capacity_provider_default, local.enable_ec2 ? "ec2" : var.enable_fargate ? "fargate" : "fargate_spot" ) # Build the default capacity provider strategy from the selected family. # FARGATE and FARGATE_SPOT may share a strategy; EC2 must stand alone. - capacity_provider_strategy = local.default_capacity_provider_family == "ec2" ? [{ + capacity_provider_strategy = local.capacity_provider_default == "ec2" ? [{ capacity_provider = aws_ecs_capacity_provider.ec2[0].name weight = var.ec2_weight base = var.ec2_base }] : concat( - local.default_capacity_provider_family == "fargate" && var.enable_fargate ? [{ + local.capacity_provider_default == "fargate" && var.enable_fargate ? [{ capacity_provider = "FARGATE" weight = var.fargate_weight base = var.fargate_base diff --git a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml index f0b75e0..4c06cd6 100644 --- a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml +++ b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml @@ -644,7 +644,7 @@ module: base_path: compute/ecs_cluster terraform_variables: ...overrides: << module.input.advanced_terraform_variables >> - default_capacity_provider_family: << module.input.default_capacity_provider_family >> + capacity_provider_default: << module.input.capacity_provider_default >> ec2_ami_id: << module.input.ec2_ami_id >> ec2_enable_spot: << module.input.ec2_enable_spot >> ec2_instance_type: << module.input.ec2_instance_type >> diff --git a/compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl b/compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl index c7b19a8..e5e0725 100644 --- a/compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl +++ b/compute/ecs_cluster/tests/capacity_provider_strategy.tftest.hcl @@ -3,7 +3,7 @@ # AWS rejects default capacity provider strategies that mix Fargate and EC2 # (Auto Scaling group) capacity providers. These tests verify that the default # strategy always commits to a single family, controlled by -# default_capacity_provider_family. +# capacity_provider_default. # # Run with: tofu test @@ -67,7 +67,7 @@ variables { } ################################################################################ -# Implicit family selection (default_capacity_provider_family = null) +# Implicit family selection (capacity_provider_default = null) ################################################################################ # Fargate only (module defaults): default strategy is FARGATE only @@ -170,8 +170,8 @@ run "explicit_fargate_family_with_ec2_enabled" { command = plan variables { - ec2_instance_type = "t3.medium" - default_capacity_provider_family = "fargate" + ec2_instance_type = "t3.medium" + capacity_provider_default = "fargate" } assert { @@ -195,9 +195,9 @@ run "explicit_spot_family" { command = plan variables { - enable_fargate = true - enable_fargate_spot = true - default_capacity_provider_family = "fargate_spot" + enable_fargate = true + enable_fargate_spot = true + capacity_provider_default = "fargate_spot" } assert { @@ -219,40 +219,40 @@ run "invalid_family_value" { command = plan variables { - default_capacity_provider_family = "bogus" + capacity_provider_default = "bogus" } - expect_failures = [var.default_capacity_provider_family] + expect_failures = [var.capacity_provider_default] } run "ec2_family_requires_ec2_enabled" { command = plan variables { - default_capacity_provider_family = "ec2" + capacity_provider_default = "ec2" } - expect_failures = [var.default_capacity_provider_family] + expect_failures = [var.capacity_provider_default] } run "fargate_family_requires_fargate_enabled" { command = plan variables { - enable_fargate = false - enable_fargate_spot = true - default_capacity_provider_family = "fargate" + enable_fargate = false + enable_fargate_spot = true + capacity_provider_default = "fargate" } - expect_failures = [var.default_capacity_provider_family] + expect_failures = [var.capacity_provider_default] } run "spot_family_requires_spot_enabled" { command = plan variables { - default_capacity_provider_family = "fargate_spot" + capacity_provider_default = "fargate_spot" } - expect_failures = [var.default_capacity_provider_family] + expect_failures = [var.capacity_provider_default] } diff --git a/compute/ecs_cluster/variables.tf b/compute/ecs_cluster/variables.tf index 3513318..56f3f64 100644 --- a/compute/ecs_cluster/variables.tf +++ b/compute/ecs_cluster/variables.tf @@ -93,29 +93,29 @@ variable "enable_container_insights" { # Default Capacity Provider Strategy ################################################################################ -variable "default_capacity_provider_family" { +variable "capacity_provider_default" { type = string description = "Capacity provider family used for the cluster's default strategy. AWS rejects default strategies that mix Fargate and EC2 (Auto Scaling group) capacity providers, so the default strategy must commit to a single family; services can still target any attached capacity provider via their own strategy. Valid values: 'ec2', 'fargate' (also includes Fargate Spot when enabled), 'fargate_spot'. When null, defaults to 'ec2' if the EC2 capacity provider is enabled, then 'fargate' if enabled, and finally 'fargate_spot'." default = null validation { - condition = var.default_capacity_provider_family == null || contains(["ec2", "fargate", "fargate_spot"], coalesce(var.default_capacity_provider_family, "null")) - error_message = "The default_capacity_provider_family must be 'ec2', 'fargate', or 'fargate_spot'." + condition = var.capacity_provider_default == null || contains(["ec2", "fargate", "fargate_spot"], coalesce(var.capacity_provider_default, "null")) + error_message = "The capacity_provider_default must be 'ec2', 'fargate', or 'fargate_spot'." } validation { - condition = var.default_capacity_provider_family != "ec2" || var.ec2_instance_type != null - error_message = "The default_capacity_provider_family 'ec2' requires ec2_instance_type to be set." + condition = var.capacity_provider_default != "ec2" || var.ec2_instance_type != null + error_message = "The capacity_provider_default 'ec2' requires ec2_instance_type to be set." } validation { - condition = var.default_capacity_provider_family != "fargate" || var.enable_fargate - error_message = "The default_capacity_provider_family 'fargate' requires enable_fargate to be true." + condition = var.capacity_provider_default != "fargate" || var.enable_fargate + error_message = "The capacity_provider_default 'fargate' requires enable_fargate to be true." } validation { - condition = var.default_capacity_provider_family != "fargate_spot" || var.enable_fargate_spot - error_message = "The default_capacity_provider_family 'fargate_spot' requires enable_fargate_spot to be true." + condition = var.capacity_provider_default != "fargate_spot" || var.enable_fargate_spot + error_message = "The capacity_provider_default 'fargate_spot' requires enable_fargate_spot to be true." } } From a3953204851cbb4c48e6f3c3afaa8753b057b139 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 10 Jun 2026 14:12:02 -0400 Subject: [PATCH 11/26] fix: omit ports on all-protocol security group rules AWS rejects explicit from_port/to_port when ip_protocol is "-1", which broke updates to existing rules (the sg-module refactor changed the ECS instance ingress rule from -1/-1 to 0/0). Null the ports in the shared security-groups module whenever the protocol is "-1" or "all", and use -1 in the ecs_cluster caller for clarity. Co-Authored-By: Claude Fable 5 --- compute/ecs_cluster/ec2.tf | 10 ++++++---- networking/security-groups/security_group.tf | 12 ++++++++---- networking/security-groups/variables.tf | 4 ++-- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/compute/ecs_cluster/ec2.tf b/compute/ecs_cluster/ec2.tf index 51cba48..4c7b9bd 100644 --- a/compute/ecs_cluster/ec2.tf +++ b/compute/ecs_cluster/ec2.tf @@ -63,13 +63,15 @@ module "ecs_instance_security_group" { allow_all_egress = true + # For ip_protocol="-1" (all protocols), AWS requires from_port/to_port to + # be -1; setting them to 0 causes update failures. ingress_rules = concat( # Allow inbound from public ALB if enabled var.enable_public_alb ? [ { description = "Allow inbound from public ALB" - from_port = 0 - to_port = 0 + from_port = -1 + to_port = -1 ip_protocol = "-1" referenced_security_group_id = module.public_alb[0].security_group_id } @@ -78,8 +80,8 @@ module "ecs_instance_security_group" { var.enable_private_alb ? [ { description = "Allow inbound from private ALB" - from_port = 0 - to_port = 0 + from_port = -1 + to_port = -1 ip_protocol = "-1" referenced_security_group_id = module.private_alb[0].security_group_id } diff --git a/networking/security-groups/security_group.tf b/networking/security-groups/security_group.tf index d219fd1..fa8f2d5 100644 --- a/networking/security-groups/security_group.tf +++ b/networking/security-groups/security_group.tf @@ -54,8 +54,10 @@ resource "aws_vpc_security_group_ingress_rule" "this" { security_group_id = aws_security_group.this.id description = each.value.description - from_port = each.value.from_port - to_port = each.value.to_port + # For ip_protocol="-1" (all protocols), AWS rejects explicit ports, so + # omit them regardless of what the caller passed. + from_port = contains(["-1", "all"], lower(each.value.ip_protocol)) ? null : each.value.from_port + to_port = contains(["-1", "all"], lower(each.value.ip_protocol)) ? null : each.value.to_port ip_protocol = each.value.ip_protocol # Source types - only one will be set @@ -79,8 +81,10 @@ resource "aws_vpc_security_group_egress_rule" "this" { security_group_id = aws_security_group.this.id description = each.value.description - from_port = each.value.from_port - to_port = each.value.to_port + # For ip_protocol="-1" (all protocols), AWS rejects explicit ports, so + # omit them regardless of what the caller passed. + from_port = contains(["-1", "all"], lower(each.value.ip_protocol)) ? null : each.value.from_port + to_port = contains(["-1", "all"], lower(each.value.ip_protocol)) ? null : each.value.to_port ip_protocol = each.value.ip_protocol # Destination types - only one will be set diff --git a/networking/security-groups/variables.tf b/networking/security-groups/variables.tf index 0b35256..4cf9c1a 100644 --- a/networking/security-groups/variables.tf +++ b/networking/security-groups/variables.tf @@ -81,7 +81,7 @@ variable "ingress_rules" { - self: Set to true to allow traffic from the same security group For ip_protocol, use "tcp", "udp", "icmp", "icmpv6", or "-1" for all protocols. - When ip_protocol is "-1", from_port and to_port must be -1. + When ip_protocol is "-1" or "all", from_port and to_port are ignored (AWS rejects explicit ports with all-protocol rules). EOF default = [] @@ -184,7 +184,7 @@ variable "egress_rules" { - self: Set to true to allow traffic to the same security group For ip_protocol, use "tcp", "udp", "icmp", "icmpv6", or "-1" for all protocols. - When ip_protocol is "-1", from_port and to_port must be -1. + When ip_protocol is "-1" or "all", from_port and to_port are ignored (AWS rejects explicit ports with all-protocol rules). EOF default = [] From 02541ed7c96ed84959506410ec0c174b90698220 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 11 Jun 2026 15:01:04 -0400 Subject: [PATCH 12/26] test alb listener rule --- compute/ecs_service/ecs_service.tf | 3 +- compute/ecs_service/listener_rules.tf | 90 +++++++++++++++++++ compute/ecs_service/locals.tf | 10 +++ compute/ecs_service/outputs.tf | 12 +++ .../ecs_service/rvn-ecs-web-definition.yml | 13 ++- compute/ecs_service/variables.tf | 24 +++++ test/ecs_service_test.go | 17 ++++ test/fixtures/ecs_service/with_alb/main.tf | 26 ++++++ 8 files changed, 192 insertions(+), 3 deletions(-) diff --git a/compute/ecs_service/ecs_service.tf b/compute/ecs_service/ecs_service.tf index b346642..240ea45 100644 --- a/compute/ecs_service/ecs_service.tf +++ b/compute/ecs_service/ecs_service.tf @@ -104,7 +104,7 @@ resource "aws_ecs_service" "this" { ? aws_lb_listener.nlb[0].arn : aws_lb_listener_rule.alb["0"].arn ) - test_listener_rule = var.test_listener_rule_arn + test_listener_rule = local.test_listener_rule_arn role_arn = aws_iam_role.ecs_infrastructure[0].arn } } @@ -145,6 +145,7 @@ resource "aws_ecs_service" "this" { aws_iam_role_policy_attachment.execution_base, aws_iam_role_policy_attachment.ecs_infrastructure_elb, aws_lb_listener_rule.alb, + aws_lb_listener_rule.test, ] # Lifecycle: desired_count is managed by autoscaling, task_definition / diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index 9820057..e9d34ea 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -87,6 +87,96 @@ resource "aws_lb_listener_rule" "alb" { } } +################################################################################ +# ALB Test Listener Rule +# +# Optional dedicated rule that routes test traffic to the alternate (green) +# target group (tg-2) during native traffic-shift deployments. ECS rewrites +# its forward action through the TEST_TRAFFIC_SHIFT lifecycle stages so the +# green revision can be validated before production traffic shifts, hence +# ignore_changes on action. Created only when test_listener_rule is set; +# outside a deployment tg-2 is empty, so test traffic returns no targets +# until a deployment registers the green revision. +################################################################################ + +resource "aws_lb_listener_rule" "test" { + count = local.enable_test_listener_rule ? 1 : 0 + + listener_arn = var.load_balancer_attachment.test_listener_rule.listener_arn + priority = var.load_balancer_attachment.test_listener_rule.priority + + action { + type = "forward" + target_group_arn = aws_lb_target_group.tg_2[0].arn + } + + dynamic "condition" { + for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "path-pattern"] + content { + path_pattern { + values = condition.value.values + } + } + } + + dynamic "condition" { + for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "host-header"] + content { + host_header { + values = condition.value.values + } + } + } + + dynamic "condition" { + for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "http-header"] + content { + http_header { + http_header_name = condition.value.values[0] + values = slice(condition.value.values, 1, length(condition.value.values)) + } + } + } + + dynamic "condition" { + for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "http-request-method"] + content { + http_request_method { + values = condition.value.values + } + } + } + + dynamic "condition" { + for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "query-string"] + content { + query_string { + key = try(condition.value.values[0], null) + value = try(condition.value.values[1], condition.value.values[0]) + } + } + } + + dynamic "condition" { + for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "source-ip"] + content { + source_ip { + values = condition.value.values + } + } + } + + tags = merge(local.tags, { + Name = "${var.name}-test-rule" + }) + + # The ECS deployment controller rewrites the forward action during + # native traffic-shift deployments (TEST_TRAFFIC_SHIFT stages). + lifecycle { + ignore_changes = [action] + } +} + ################################################################################ # NLB Listeners # For NLB, we create the listener directly (no listener rules in NLB). diff --git a/compute/ecs_service/locals.tf b/compute/ecs_service/locals.tf index 7b9e62d..83b887a 100644 --- a/compute/ecs_service/locals.tf +++ b/compute/ecs_service/locals.tf @@ -42,6 +42,16 @@ locals { # Determine if NLB listener should be created (vs ALB listener rules) enable_nlb_listener = local.enable_load_balancer && var.load_balancer_attachment.nlb_listener != null + # Determine if a dedicated test listener rule should be created. Drives + # the advanced_configuration.test_listener_rule wiring and the + # TEST_TRAFFIC_SHIFT lifecycle stages on native traffic-shift deploys. + enable_test_listener_rule = local.enable_load_balancer && var.load_balancer_attachment.test_listener_rule != null + + # ARN passed to advanced_configuration.test_listener_rule and exported: + # the module-created rule when configured, else an externally-managed + # rule ARN supplied by the caller, else null. + test_listener_rule_arn = local.enable_test_listener_rule ? aws_lb_listener_rule.test[0].arn : var.test_listener_rule_arn + # Placeholder container name and port placeholder_container_name = "app" placeholder_container_port = var.container_port diff --git a/compute/ecs_service/outputs.tf b/compute/ecs_service/outputs.tf index 8764dca..dc9e716 100644 --- a/compute/ecs_service/outputs.tf +++ b/compute/ecs_service/outputs.tf @@ -159,6 +159,18 @@ output "nlb_listener_arn" { value = local.enable_nlb_listener ? aws_lb_listener.nlb[0].arn : null } +output "production_listener_rule_arn" { + description = "ARN of the production listener rule (ALB) or listener (NLB) the ECS deployment controller rewrites during native traffic-shift deployments. This is the value the deploy manager passes as advanced_configuration.production_listener_rule on UpdateService (null if load balancer disabled)." + value = local.enable_load_balancer ? ( + local.enable_nlb_listener ? aws_lb_listener.nlb[0].arn : aws_lb_listener_rule.alb["0"].arn + ) : null +} + +output "test_listener_rule_arn" { + description = "ARN of the test listener rule the ECS deployment controller rewrites during the TEST_TRAFFIC_SHIFT lifecycle stages, routing test traffic to the green revision before the production cutover. The deploy manager passes it as advanced_configuration.test_listener_rule on UpdateService. Null when no test listener rule is configured (the common case)." + value = local.test_listener_rule_arn +} + ################################################################################ # Auto Scaling ################################################################################ diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index b573edb..796f6d7 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -3,8 +3,8 @@ definition: name: ECS Web Server description: Web server ECS service for running an HTTP application behind an ECS cluster load balancer. release: - version: 0.1.3 - description: unify labels and descriptions across modules + version: 0.1.4 + description: surface load-balancer advanced-config outputs (alternate target group, production + optional test listener rules, infrastructure role) for native blue/green/linear/canary deployments module: inputs: - id: section_service @@ -1275,6 +1275,15 @@ module: ecs_cluster_arn: <> ecs_service_arns: - <> + # Load-balancer advanced configuration for native traffic-shift + # strategies (blue_green / linear / canary). The deploy manager + # attaches these to UpdateService loadBalancers[].advancedConfiguration + # so ECS can shift traffic between the production and alternate target + # groups. Null/absent for non-load-balanced services (rolling only). + ecs_alternate_target_group_arn: <> + ecs_production_listener_rule_arn: <> + ecs_infrastructure_role_arn: <> + ecs_test_listener_rule_arn: <> inputs: - id: image_ref label: Image tag or digest diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index 1d9fa41..ac7256e 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -520,6 +520,30 @@ variable "load_balancer_attachment" { alpn_policy = optional(string) # For TLS: HTTP1Only, HTTP2Only, etc. }), null) + # ALB: Optional test listener rule (attach to an existing ALB listener). + # + # When set, the module creates a dedicated listener rule that forwards + # test traffic to the alternate (green) target group during native + # traffic-shift deployments (blue_green/linear/canary). Its ARN is + # wired into the service's advanced_configuration.test_listener_rule, + # which drives the TEST_TRAFFIC_SHIFT lifecycle stages: ECS routes + # test traffic to the green revision so it can be validated before + # production traffic shifts. + # + # Provide distinguishing conditions (e.g. an http-header or a + # dedicated path/host) so the test rule does not collide with the + # production listener rule on the same listener. Omit for services + # that do not need a pre-cutover test phase — most services. + test_listener_rule = optional(object({ + listener_arn = string + priority = optional(number, null) # null = AWS auto-assigns next available priority + + conditions = list(object({ + type = string + values = list(string) + })) + }), null) + container_name = optional(string, null) container_port = optional(number, null) }) diff --git a/test/ecs_service_test.go b/test/ecs_service_test.go index 0c2ae98..6afee5e 100644 --- a/test/ecs_service_test.go +++ b/test/ecs_service_test.go @@ -188,6 +188,23 @@ func TestEcsServiceWithAlb(t *testing.T) { assert.NotEmpty(t, aws.ToString(advancedConfig.ProductionListenerRule), "production listener rule should be set") assert.NotEmpty(t, aws.ToString(advancedConfig.RoleArn), "infrastructure role should be set") + // The production_listener_rule_arn output is what the deploy manager + // plumbs into UpdateService advancedConfiguration for native + // traffic-shift deployments, so it must match the rule AWS actually + // persisted on the service. + productionListenerRuleArn := terraform.Output(t, terraformOptions, "production_listener_rule_arn") + require.NotEmpty(t, productionListenerRuleArn, "production_listener_rule_arn should not be empty") + assert.Equal(t, productionListenerRuleArn, aws.ToString(advancedConfig.ProductionListenerRule), "production_listener_rule_arn output should match the rule on the service") + + // The fixture configures a dedicated test listener rule, so the module + // must create it, export its ARN, and wire it into the service's + // advanced_configuration.test_listener_rule — the value the deploy + // manager forwards to drive the TEST_TRAFFIC_SHIFT lifecycle stages. + testListenerRuleArn := terraform.Output(t, terraformOptions, "test_listener_rule_arn") + require.NotEmpty(t, testListenerRuleArn, "test_listener_rule_arn should not be empty when a test listener rule is configured") + assert.NotEqual(t, productionListenerRuleArn, testListenerRuleArn, "test and production listener rules must be distinct") + assert.Equal(t, testListenerRuleArn, aws.ToString(advancedConfig.TestListenerRule), "test_listener_rule_arn output should match the rule on the service") + // Wait for targets to be registered in the target group // The ECS service needs time to register tasks with the target group t.Log("Waiting for targets to be registered with the target group...") diff --git a/test/fixtures/ecs_service/with_alb/main.tf b/test/fixtures/ecs_service/with_alb/main.tf index 3faab2f..b47bac6 100644 --- a/test/fixtures/ecs_service/with_alb/main.tf +++ b/test/fixtures/ecs_service/with_alb/main.tf @@ -161,6 +161,22 @@ module "ecs_service" { ] } ] + + # Dedicated test listener rule on the same listener, distinguished by + # a header so it does not collide with the production path rule. ECS + # rewrites it to the green revision during the TEST_TRAFFIC_SHIFT + # lifecycle stages of a native traffic-shift deployment. + test_listener_rule = { + listener_arn = module.ecs_cluster.public_alb_http_listener_arn + priority = 90 + + conditions = [ + { + type = "http-header" + values = ["X-FC-Test", "true"] + } + ] + } } # Give load balancer time to register targets @@ -238,6 +254,16 @@ output "ecs_infrastructure_role_arn" { value = module.ecs_service.ecs_infrastructure_role_arn } +output "production_listener_rule_arn" { + description = "The ARN of the production listener rule wired into the service's advanced_configuration." + value = module.ecs_service.production_listener_rule_arn +} + +output "test_listener_rule_arn" { + description = "The ARN of the test listener rule wired into the service's advanced_configuration." + value = module.ecs_service.test_listener_rule_arn +} + output "alb_security_group_id" { description = "The ID of the ALB security group." value = module.ecs_cluster.public_alb_security_group_id From 8791e9048ce8314d0cf518535fb966916b7b8b72 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 11 Jun 2026 19:37:39 -0400 Subject: [PATCH 13/26] add green_alb_listener_rule_enabled as tf var --- compute/ecs_service/listener_rules.tf | 2 +- compute/ecs_service/locals.tf | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index e9d34ea..0cc5640 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -100,7 +100,7 @@ resource "aws_lb_listener_rule" "alb" { ################################################################################ resource "aws_lb_listener_rule" "test" { - count = local.enable_test_listener_rule ? 1 : 0 + count = local.green_alb_listener_rule_enabled ? 1 : 0 listener_arn = var.load_balancer_attachment.test_listener_rule.listener_arn priority = var.load_balancer_attachment.test_listener_rule.priority diff --git a/compute/ecs_service/locals.tf b/compute/ecs_service/locals.tf index 0f212d2..8e92f12 100644 --- a/compute/ecs_service/locals.tf +++ b/compute/ecs_service/locals.tf @@ -45,12 +45,12 @@ locals { # Determine if a dedicated test listener rule should be created. Drives # the advanced_configuration.test_listener_rule wiring and the # TEST_TRAFFIC_SHIFT lifecycle stages on native traffic-shift deploys. - enable_test_listener_rule = local.enable_load_balancer && var.load_balancer_attachment.test_listener_rule != null + green_alb_listener_rule_enabled = local.enable_load_balancer && var.load_balancer_attachment.test_listener_rule != null # ARN passed to advanced_configuration.test_listener_rule and exported: # the module-created rule when configured, else an externally-managed # rule ARN supplied by the caller, else null. - test_listener_rule_arn = local.enable_test_listener_rule ? aws_lb_listener_rule.test[0].arn : var.test_listener_rule_arn + test_listener_rule_arn = local.green_alb_listener_rule_enabled ? aws_lb_listener_rule.test[0].arn : var.test_listener_rule_arn # Placeholder container name and port placeholder_container_name = "app" From 4a8fd0ee0e1ddaa09d7ccc5dd7e8fd40ff28b248 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 11 Jun 2026 20:26:45 -0400 Subject: [PATCH 14/26] add green_alb_listener rule --- compute/ecs_service/listener_rules.tf | 54 +++++++++++++------ compute/ecs_service/locals.tf | 32 +++++++++-- .../ecs_service/rvn-ecs-web-definition.yml | 27 ++++++++++ compute/ecs_service/variables.tf | 44 +++++++-------- test/fixtures/ecs_service/with_alb/main.tf | 22 +++----- 5 files changed, 118 insertions(+), 61 deletions(-) diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index 0cc5640..0b397f2 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -13,7 +13,14 @@ resource "aws_lb_listener_rule" "alb" { } : {} listener_arn = each.value.listener_arn - priority = each.value.priority + # The mirrored rule (index 0) takes the module-managed base priority when + # the green test rule is enabled so the test rule can sit one slot ahead; + # every other rule keeps its configured priority. + priority = ( + local.green_alb_listener_rule_enabled && each.key == "0" + ? local.green_production_priority + : each.value.priority + ) action { type = "forward" @@ -88,30 +95,45 @@ resource "aws_lb_listener_rule" "alb" { } ################################################################################ -# ALB Test Listener Rule +# ALB Test (Green) Listener Rule # # Optional dedicated rule that routes test traffic to the alternate (green) -# target group (tg-2) during native traffic-shift deployments. ECS rewrites -# its forward action through the TEST_TRAFFIC_SHIFT lifecycle stages so the -# green revision can be validated before production traffic shifts, hence -# ignore_changes on action. Created only when test_listener_rule is set; -# outside a deployment tg-2 is empty, so test traffic returns no targets -# until a deployment registers the green revision. +# target group (tg-2) during native traffic-shift deployments. Gated by +# var.green_alb_listener_rule_enabled. It reuses the production listener and +# routing conditions (listener_rules[0]) but forwards to the green target +# group; the ECS deployment controller rewrites its forward action through +# the TEST_TRAFFIC_SHIFT lifecycle stages so the green revision can be +# validated before production traffic shifts, hence ignore_changes on +# action. Outside a deployment tg-2 is empty, so it returns no targets until +# a deployment registers the green revision. ################################################################################ resource "aws_lb_listener_rule" "test" { count = local.green_alb_listener_rule_enabled ? 1 : 0 - listener_arn = var.load_balancer_attachment.test_listener_rule.listener_arn - priority = var.load_balancer_attachment.test_listener_rule.priority + listener_arn = var.load_balancer_attachment.listener_rules[0].listener_arn + # One slot ahead of the production rule so a request carrying the test + # header matches this rule first; ALB routes by priority order, not by + # specificity, so without this it would fall through to production. + priority = local.green_test_priority action { type = "forward" target_group_arn = aws_lb_target_group.tg_2[0].arn } + # Distinguishing condition: only requests carrying the test header reach + # the green target group. Combined with the mirrored production + # conditions below, normal (header-less) traffic still matches production. + condition { + http_header { + http_header_name = var.test_header_name + values = [var.test_header_value] + } + } + dynamic "condition" { - for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "path-pattern"] + for_each = [for c in var.load_balancer_attachment.listener_rules[0].conditions : c if c.type == "path-pattern"] content { path_pattern { values = condition.value.values @@ -120,7 +142,7 @@ resource "aws_lb_listener_rule" "test" { } dynamic "condition" { - for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "host-header"] + for_each = [for c in var.load_balancer_attachment.listener_rules[0].conditions : c if c.type == "host-header"] content { host_header { values = condition.value.values @@ -129,7 +151,7 @@ resource "aws_lb_listener_rule" "test" { } dynamic "condition" { - for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "http-header"] + for_each = [for c in var.load_balancer_attachment.listener_rules[0].conditions : c if c.type == "http-header"] content { http_header { http_header_name = condition.value.values[0] @@ -139,7 +161,7 @@ resource "aws_lb_listener_rule" "test" { } dynamic "condition" { - for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "http-request-method"] + for_each = [for c in var.load_balancer_attachment.listener_rules[0].conditions : c if c.type == "http-request-method"] content { http_request_method { values = condition.value.values @@ -148,7 +170,7 @@ resource "aws_lb_listener_rule" "test" { } dynamic "condition" { - for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "query-string"] + for_each = [for c in var.load_balancer_attachment.listener_rules[0].conditions : c if c.type == "query-string"] content { query_string { key = try(condition.value.values[0], null) @@ -158,7 +180,7 @@ resource "aws_lb_listener_rule" "test" { } dynamic "condition" { - for_each = [for c in var.load_balancer_attachment.test_listener_rule.conditions : c if c.type == "source-ip"] + for_each = [for c in var.load_balancer_attachment.listener_rules[0].conditions : c if c.type == "source-ip"] content { source_ip { values = condition.value.values diff --git a/compute/ecs_service/locals.tf b/compute/ecs_service/locals.tf index 8e92f12..d8da657 100644 --- a/compute/ecs_service/locals.tf +++ b/compute/ecs_service/locals.tf @@ -42,10 +42,34 @@ locals { # Determine if NLB listener should be created (vs ALB listener rules) enable_nlb_listener = local.enable_load_balancer && var.load_balancer_attachment.nlb_listener != null - # Determine if a dedicated test listener rule should be created. Drives - # the advanced_configuration.test_listener_rule wiring and the - # TEST_TRAFFIC_SHIFT lifecycle stages on native traffic-shift deploys. - green_alb_listener_rule_enabled = local.enable_load_balancer && var.load_balancer_attachment.test_listener_rule != null + # Determine if a dedicated test (green) ALB listener rule should be + # created. Drives the advanced_configuration.test_listener_rule wiring + # and the TEST_TRAFFIC_SHIFT lifecycle stages on native traffic-shift + # deploys. ALB-only — requires a production listener rule to mirror; a + # no-op for NLB services. + green_alb_listener_rule_enabled = ( + local.enable_load_balancer + && !local.enable_nlb_listener + && var.green_alb_listener_rule_enabled + && length(var.load_balancer_attachment.listener_rules) > 0 + ) + + # When the green rule is enabled the module owns both priorities so the + # test rule (production conditions + the X-Ravion-Test header) is always + # evaluated before the production rule — otherwise ALB, which routes by + # priority order and not specificity, would match production first and a + # header-bearing request would never reach green. The production rule's + # priority becomes the base (its configured priority, else the default + # below) and the test rule sits one slot ahead at base - 1. Both numbers + # must be unique across all rules on a shared listener; set an explicit + # Listener rule priority per service when several green services share a + # listener. + green_default_production_priority = 1000 + green_production_priority = local.green_alb_listener_rule_enabled ? coalesce( + var.load_balancer_attachment.listener_rules[0].priority, + local.green_default_production_priority, + ) : null + green_test_priority = local.green_alb_listener_rule_enabled ? local.green_production_priority - 1 : null # ARN passed to advanced_configuration.test_listener_rule and exported: # the module-created rule when configured, else an externally-managed diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index c85eab7..b824d76 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -315,6 +315,30 @@ module: collapsible: true max: 50000 min: 1 + - id: green_alb_listener_rule_enabled + label: Create a listener rule to test green deployments + type: boolean + description: Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during blue/green, linear, and canary deployments, so the new revision can be validated before production traffic shifts. Has no effect for rolling deployments. + collapsible: true + default: false + - id: test_header_name + label: Test traffic header name + type: string + description: HTTP header that routes a request to the green (test) target group during a deployment. Send this header with the value below to reach the new revision before production traffic shifts. Requests without it keep hitting production. + collapsible: true + required: true + default: X-Ravion-Test + show_when: + green_alb_listener_rule_enabled: true + - id: test_header_value + label: Test traffic header value + type: string + description: Value the test traffic header must carry to be routed to the green (test) target group. + collapsible: true + required: true + default: "1" + show_when: + green_alb_listener_rule_enabled: true - id: section_task label: Container resources type: section @@ -1160,6 +1184,9 @@ module: execution_role_arn: << module.input.execution_role_arn || nil >> execution_role_policies: << module.input.execution_role_policies >> force_new_deployment: << module.input.force_new_deployment >> + green_alb_listener_rule_enabled: << module.input.green_alb_listener_rule_enabled >> + test_header_name: << module.input.test_header_name >> + test_header_value: << module.input.test_header_value >> health_check_grace_period_seconds: << module.input.health_check_grace_period_seconds >> launch_type: '<< module.input.capacity_provider == "ec2" ? "EC2" : "FARGATE" >>' load_balancer_attachment: diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index 6efe7f3..ada1ae8 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -326,10 +326,28 @@ variable "deployment_strategy_config" { variable "test_listener_rule_arn" { type = string - description = "Optional ARN of an ALB listener rule that routes test traffic for blue/green validation (drives the TEST_TRAFFIC_SHIFT lifecycle stages). Only used for native traffic-shift strategies." + description = "Optional ARN of an externally-managed ALB listener rule that routes test traffic for blue/green validation (drives the TEST_TRAFFIC_SHIFT lifecycle stages). Only used for native traffic-shift strategies. Takes precedence over green_alb_listener_rule_enabled." default = null } +variable "green_alb_listener_rule_enabled" { + type = bool + description = "Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during native traffic-shift deployments (blue_green/linear/canary), so the new revision can be validated before production traffic shifts. The rule reuses the production listener and routing conditions plus a distinguishing test header (test_header_name/test_header_value) and forwards to the alternate target group; the ECS deployment controller rewrites it through the TEST_TRAFFIC_SHIFT lifecycle stages. No effect for NLB or rolling-only services." + default = false +} + +variable "test_header_name" { + type = string + description = "HTTP header name that distinguishes test traffic for the green listener rule. Requests carrying this header (with test_header_value) match the green rule and reach the alternate target group; requests without it fall through to production. Only used when green_alb_listener_rule_enabled is true." + default = "X-Ravion-Test" +} + +variable "test_header_value" { + type = string + description = "Value paired with test_header_name for routing test traffic to the green target group. Only used when green_alb_listener_rule_enabled is true." + default = "1" +} + variable "deployment_minimum_healthy_percent" { type = number description = "The minimum healthy percent during deployment (rolling deployments only)." @@ -520,30 +538,6 @@ variable "load_balancer_attachment" { alpn_policy = optional(string) # For TLS: HTTP1Only, HTTP2Only, etc. }), null) - # ALB: Optional test listener rule (attach to an existing ALB listener). - # - # When set, the module creates a dedicated listener rule that forwards - # test traffic to the alternate (green) target group during native - # traffic-shift deployments (blue_green/linear/canary). Its ARN is - # wired into the service's advanced_configuration.test_listener_rule, - # which drives the TEST_TRAFFIC_SHIFT lifecycle stages: ECS routes - # test traffic to the green revision so it can be validated before - # production traffic shifts. - # - # Provide distinguishing conditions (e.g. an http-header or a - # dedicated path/host) so the test rule does not collide with the - # production listener rule on the same listener. Omit for services - # that do not need a pre-cutover test phase — most services. - test_listener_rule = optional(object({ - listener_arn = string - priority = optional(number, null) # null = AWS auto-assigns next available priority - - conditions = list(object({ - type = string - values = list(string) - })) - }), null) - container_name = optional(string, null) container_port = optional(number, null) }) diff --git a/test/fixtures/ecs_service/with_alb/main.tf b/test/fixtures/ecs_service/with_alb/main.tf index 5f06573..1be8c4e 100644 --- a/test/fixtures/ecs_service/with_alb/main.tf +++ b/test/fixtures/ecs_service/with_alb/main.tf @@ -161,24 +161,14 @@ module "ecs_service" { ] } ] - - # Dedicated test listener rule on the same listener, distinguished by - # a header so it does not collide with the production path rule. ECS - # rewrites it to the green revision during the TEST_TRAFFIC_SHIFT - # lifecycle stages of a native traffic-shift deployment. - test_listener_rule = { - listener_arn = module.ecs_cluster.public_alb_http_listener_arn - priority = 90 - - conditions = [ - { - type = "http-header" - values = ["X-FC-Test", "true"] - } - ] - } } + # Create the green (test) ALB listener rule. It mirrors the production + # listener + conditions and forwards to the alternate target group; ECS + # rewrites it during the TEST_TRAFFIC_SHIFT lifecycle stages of a native + # traffic-shift deployment. + green_alb_listener_rule_enabled = true + # Give load balancer time to register targets health_check_grace_period_seconds = 60 From 703a380f19dc045d88125b6552f3c3450aebc2c8 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Fri, 12 Jun 2026 13:04:40 -0400 Subject: [PATCH 15/26] enable group stickiness --- compute/ecs_service/.terraform.lock.hcl | 41 +++++++-------- compute/ecs_service/listener_rules.tf | 38 +++++++++++++- compute/ecs_service/locals.tf | 20 ++++++++ compute/ecs_service/tests/basic.tftest.hcl | 58 ++++++++++++++++++++++ 4 files changed, 135 insertions(+), 22 deletions(-) diff --git a/compute/ecs_service/.terraform.lock.hcl b/compute/ecs_service/.terraform.lock.hcl index 28ffb2f..5c17aaf 100644 --- a/compute/ecs_service/.terraform.lock.hcl +++ b/compute/ecs_service/.terraform.lock.hcl @@ -1,25 +1,26 @@ -# This file is maintained automatically by "tofu init". +# This file is maintained automatically by "terraform init". # Manual edits may be lost in future updates. -provider "registry.opentofu.org/hashicorp/aws" { - version = "6.39.0" - constraints = ">= 5.0.0" +provider "registry.terraform.io/hashicorp/aws" { + version = "6.50.0" + constraints = ">= 6.0.0, >= 6.21.0" hashes = [ - "h1:c9SG8ZdYgzqpxORpTqeLFeXW4qQQ8GMGCcUkU+FAfQM=", - "zh:00a6c0d8b5b86833087e367b632e9ab73fb8db9c43569020ebd0489dc2c919ce", - "zh:05f2b56211f4c8a0b66a093d025187cbc7be086dedef62306f5a28290598ebdc", - "zh:24d97a31d5ab814c33ed32a5b7674f1a15544b2367a95bddd00cfdd8d6b82740", - "zh:258194e24ac07ee194d580ca25a25fa7bc48fa40fed4fd58352b0a64da0da4c9", - "zh:315337e5f0ccafeadf490f117151b52c6d66244bf652f4fee975eddda662af3b", - "zh:38573dd56cca8c0ffe33396cf17cc8bd13de1d27d3c4da4177e485d174f1eaf0", - "zh:4baa806c5eb8faae95cea3f1dfafb153b5e3e96c5b30a2102072da4f032d2d9b", - "zh:4f258106baca7e00a6904b2353579d283e4400a75cd0353a25e057921e8a8d96", - "zh:62e5d4628d03883a6c2a6e3c297eb54df9b5935e9e3a655dbb1c6c5ddaf7ea33", - "zh:8af5fae01c1cef65d149fa6fe47e94cf46ffa97d29e8f2dfe41aeae01da590ea", - "zh:a8240b40f7be408ac24897597a85dc4fe56f390224b11ecad2c1327e686fca58", - "zh:c549eee2a0cf0e2c4a676614d990121b685beab0047b1073407ee26247c4be13", - "zh:cfed074ba8948c75445c74c69722cb17c960024b1917b4f26905aa9c9ac4e667", - "zh:d6f4f4fa01e33d0d546705e2776f38d0b4f2847827b3f07ecde87cc02ef3d23e", - "zh:e7239b349c3149e4670750481b687c5c828908fd09f2196d7af1ac1b4d83e80b", + "h1:D8uNiOpl3UkAX4zI5T47ALMiRFXTa1XfdQC+TBu3RmE=", + "zh:0072806bb262c6d86bc25b4a75750e469881144c14818afdba7b82db840e1588", + "zh:1ebc2dae335dad7a8b16a1985b69a63a14954282bb44fdba7d5103f77551ac7b", + "zh:2dab48fe8f3193b8216d578ac1e3674fa566435cc7dbce2953d55b72e31d0241", + "zh:2fc3d3029c2b7429472391ef339672e1fca8e6ff32c8a519bf3acedafa7e24fe", + "zh:38a36e64e7212f6cedac861ea4d449cce07131b3378de601bf9d49a99e000208", + "zh:3ac70758ed251ce78b7f541a5a79cc6fe56474412783ae1decef719bdd0f30bf", + "zh:4385d3903e685bddb2b8005b4eb7db89f030267d4d03c7d792d2f5e739cc874a", + "zh:4cce0760b87fbafd51f30faec2a737f4183b7c615f4a86557f7d3c893a610dc5", + "zh:4feaeed18694239b896c6415d9a1e5ef89e1da4f4ad60924aa0522adeb1f6599", + "zh:502fca2be1c95f443c3e67d0555601d1de65b4ca82d197c059e9c868360e3a0a", + "zh:57d037f6fdd045f2660909c3bdface9622d81165ce647479cba98d1f353c5eab", + "zh:5dc5a0b915c2ac5256d909458f5c8e40b35f78b3a36ea893c86624eaf6c54e37", + "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425", + "zh:b84c87c58a320adbb2c74a4cad03ae5aac7f2eae21db26f00fdde98c8c4d4523", + "zh:c895f1d5cbcbeff77850ac99efd36bde0048d4e909b296882331b9b9ebf48cfa", + "zh:ead82831683619124597a1f170dd31e9b293e9cf22f558cb166d5e734fcd11e4", ] } diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index 0b397f2..ffdc6eb 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -22,9 +22,26 @@ resource "aws_lb_listener_rule" "alb" { : each.value.priority ) + # When the target groups have target-level stickiness the forward + # action must carry group-level stickiness or ELBv2 rejects the + # weighted forward ECS writes during traffic-shift deployments — see + # alb_group_stickiness_enabled in locals.tf. action { type = "forward" - target_group_arn = aws_lb_target_group.tg_1[0].arn + target_group_arn = local.alb_group_stickiness_enabled ? null : aws_lb_target_group.tg_1[0].arn + + dynamic "forward" { + for_each = local.alb_group_stickiness_enabled ? [1] : [] + content { + target_group { + arn = aws_lb_target_group.tg_1[0].arn + } + stickiness { + enabled = true + duration = local.alb_group_stickiness_duration + } + } + } } dynamic "condition" { @@ -117,9 +134,26 @@ resource "aws_lb_listener_rule" "test" { # specificity, so without this it would fall through to production. priority = local.green_test_priority + # Same group-stickiness requirement as the production rule: ECS + # rewrites this rule's forward action through the TEST_TRAFFIC_SHIFT + # stages, and ELBv2 rejects the rewrite when the sticky target groups + # are referenced without group-level stickiness on the action. action { type = "forward" - target_group_arn = aws_lb_target_group.tg_2[0].arn + target_group_arn = local.alb_group_stickiness_enabled ? null : aws_lb_target_group.tg_2[0].arn + + dynamic "forward" { + for_each = local.alb_group_stickiness_enabled ? [1] : [] + content { + target_group { + arn = aws_lb_target_group.tg_2[0].arn + } + stickiness { + enabled = true + duration = local.alb_group_stickiness_duration + } + } + } } # Distinguishing condition: only requests carrying the test header reach diff --git a/compute/ecs_service/locals.tf b/compute/ecs_service/locals.tf index d8da657..0f39609 100644 --- a/compute/ecs_service/locals.tf +++ b/compute/ecs_service/locals.tf @@ -76,6 +76,26 @@ locals { # rule ARN supplied by the caller, else null. test_listener_rule_arn = local.green_alb_listener_rule_enabled ? aws_lb_listener_rule.test[0].arn : var.test_listener_rule_arn + # ALB rules whose forward action ECS rewrites during native + # traffic-shift deployments must carry group-level stickiness when the + # target groups have target-level stickiness: ELBv2 rejects a + # multi-target-group forward referencing a sticky target group unless + # the action itself has TargetGroupStickinessConfig enabled ("You must + # enable group stickiness on a rule if you enabled target stickiness + # on one of its target groups"), which fails the deployment's + # PRE_SCALE_UP stage. ALB-only — NLB listeners forward to one target + # group at a time. + alb_group_stickiness_enabled = ( + local.enable_load_balancer + && !local.enable_nlb_listener + && var.load_balancer_attachment.target_group.stickiness != null + && var.load_balancer_attachment.target_group.stickiness.enabled + ) + # Reuse the target-group cookie duration so a client pinned to the + # blue or green group stays pinned for the same window as its + # in-group target pinning. + alb_group_stickiness_duration = local.alb_group_stickiness_enabled ? var.load_balancer_attachment.target_group.stickiness.cookie_duration : null + # Placeholder container name and port placeholder_container_name = "app" placeholder_container_port = var.container_port diff --git a/compute/ecs_service/tests/basic.tftest.hcl b/compute/ecs_service/tests/basic.tftest.hcl index c4a5072..a821160 100644 --- a/compute/ecs_service/tests/basic.tftest.hcl +++ b/compute/ecs_service/tests/basic.tftest.hcl @@ -236,6 +236,64 @@ run "blue_green_deployment" { condition = length(aws_iam_role.ecs_infrastructure) == 1 error_message = "Should create the ECS infrastructure role for native traffic-shift strategies" } + + assert { + condition = length(aws_lb_listener_rule.alb["0"].action[0].forward) == 0 + error_message = "Without target-group stickiness the rule should use a plain forward (no group-stickiness block)" + } +} + +################################################################################ +# Test: Sticky target groups require group-level stickiness on the rules +# +# ECS rewrites the production/test rules into a weighted forward across +# tg-1 + tg-2 during traffic-shift deployments; ELBv2 rejects that +# rewrite at PRE_SCALE_UP when the target groups have target-level +# stickiness but the rule's forward action lacks group stickiness. +################################################################################ + +run "sticky_target_groups_enable_group_stickiness" { + command = plan + + variables { + deployment_type = "blue_green" + container_port = 8080 + green_alb_listener_rule_enabled = true + load_balancer_attachment = { + target_group = { + port = 8080 + protocol = "HTTP" + stickiness = { + enabled = true + type = "lb_cookie" + cookie_duration = 3600 + } + } + listener_rules = [{ + listener_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-alb/1234567890123456/1234567890123456" + priority = 100 + conditions = [{ + type = "host-header" + values = ["api.example.com"] + }] + }] + } + } + + assert { + condition = aws_lb_listener_rule.alb["0"].action[0].target_group_arn == null + error_message = "Sticky target groups should switch the production rule to the expanded forward block" + } + + assert { + condition = aws_lb_listener_rule.alb["0"].action[0].forward[0].stickiness[0].enabled && aws_lb_listener_rule.alb["0"].action[0].forward[0].stickiness[0].duration == 3600 + error_message = "Production rule forward action should enable group stickiness with the target-group cookie duration" + } + + assert { + condition = aws_lb_listener_rule.test[0].action[0].forward[0].stickiness[0].enabled && aws_lb_listener_rule.test[0].action[0].forward[0].stickiness[0].duration == 3600 + error_message = "Test (green) rule forward action should enable group stickiness with the target-group cookie duration" + } } ################################################################################ From 7a6eff5ddb12192ce59d0083747cc6543e87ec42 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Fri, 12 Jun 2026 20:33:32 -0400 Subject: [PATCH 16/26] test rule by default --- compute/ecs_service/listener_rules.tf | 6 +++--- compute/ecs_service/rvn-ecs-web-definition.yml | 11 ----------- compute/ecs_service/target_groups.tf | 1 - compute/ecs_service/variables.tf | 4 ++-- test/fixtures/ecs_service/with_alb/main.tf | 6 ------ 5 files changed, 5 insertions(+), 23 deletions(-) diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index ffdc6eb..a5027fe 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -114,9 +114,9 @@ resource "aws_lb_listener_rule" "alb" { ################################################################################ # ALB Test (Green) Listener Rule # -# Optional dedicated rule that routes test traffic to the alternate (green) -# target group (tg-2) during native traffic-shift deployments. Gated by -# var.green_alb_listener_rule_enabled. It reuses the production listener and +# Dedicated rule, created by default for ALB services, that routes test +# traffic to the alternate (green) target group (tg-2) during native +# traffic-shift deployments. It reuses the production listener and # routing conditions (listener_rules[0]) but forwards to the green target # group; the ECS deployment controller rewrites its forward action through # the TEST_TRAFFIC_SHIFT lifecycle stages so the green revision can be diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index b824d76..852f11b 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -315,12 +315,6 @@ module: collapsible: true max: 50000 min: 1 - - id: green_alb_listener_rule_enabled - label: Create a listener rule to test green deployments - type: boolean - description: Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during blue/green, linear, and canary deployments, so the new revision can be validated before production traffic shifts. Has no effect for rolling deployments. - collapsible: true - default: false - id: test_header_name label: Test traffic header name type: string @@ -328,8 +322,6 @@ module: collapsible: true required: true default: X-Ravion-Test - show_when: - green_alb_listener_rule_enabled: true - id: test_header_value label: Test traffic header value type: string @@ -337,8 +329,6 @@ module: collapsible: true required: true default: "1" - show_when: - green_alb_listener_rule_enabled: true - id: section_task label: Container resources type: section @@ -1184,7 +1174,6 @@ module: execution_role_arn: << module.input.execution_role_arn || nil >> execution_role_policies: << module.input.execution_role_policies >> force_new_deployment: << module.input.force_new_deployment >> - green_alb_listener_rule_enabled: << module.input.green_alb_listener_rule_enabled >> test_header_name: << module.input.test_header_name >> test_header_value: << module.input.test_header_value >> health_check_grace_period_seconds: << module.input.health_check_grace_period_seconds >> diff --git a/compute/ecs_service/target_groups.tf b/compute/ecs_service/target_groups.tf index ff77c28..db15dfb 100644 --- a/compute/ecs_service/target_groups.tf +++ b/compute/ecs_service/target_groups.tf @@ -51,7 +51,6 @@ resource "aws_lb_target_group" "tg_1" { lifecycle { create_before_destroy = true - ignore_changes = [name] } } diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index ada1ae8..dc3b579 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -332,8 +332,8 @@ variable "test_listener_rule_arn" { variable "green_alb_listener_rule_enabled" { type = bool - description = "Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during native traffic-shift deployments (blue_green/linear/canary), so the new revision can be validated before production traffic shifts. The rule reuses the production listener and routing conditions plus a distinguishing test header (test_header_name/test_header_value) and forwards to the alternate target group; the ECS deployment controller rewrites it through the TEST_TRAFFIC_SHIFT lifecycle stages. No effect for NLB or rolling-only services." - default = false + description = "Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during native traffic-shift deployments (blue_green/linear/canary), so the new revision can be validated before production traffic shifts. The rule reuses the production listener and routing conditions plus a distinguishing test header (test_header_name/test_header_value) and forwards to the alternate target group; the ECS deployment controller rewrites it through the TEST_TRAFFIC_SHIFT lifecycle stages. Created by default; no effect for NLB services." + default = true } variable "test_header_name" { diff --git a/test/fixtures/ecs_service/with_alb/main.tf b/test/fixtures/ecs_service/with_alb/main.tf index 1be8c4e..d9982f5 100644 --- a/test/fixtures/ecs_service/with_alb/main.tf +++ b/test/fixtures/ecs_service/with_alb/main.tf @@ -163,12 +163,6 @@ module "ecs_service" { ] } - # Create the green (test) ALB listener rule. It mirrors the production - # listener + conditions and forwards to the alternate target group; ECS - # rewrites it during the TEST_TRAFFIC_SHIFT lifecycle stages of a native - # traffic-shift deployment. - green_alb_listener_rule_enabled = true - # Give load balancer time to register targets health_check_grace_period_seconds = 60 From 4e2fc4a5a041d6f21adf49e2ddc8971060ac44a3 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Mon, 15 Jun 2026 21:29:42 -0400 Subject: [PATCH 17/26] update module --- compute/ecs_service/rvn-ecs-web-definition.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index b80d5dd..1665979 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -1435,8 +1435,8 @@ module: queue_size: 1 infrastructure: ecs_cluster_arn: <> - ecs_service_arns: - - <> + ecs_service_arn: <> + ecs_target_group_arn: <> # Load-balancer advanced configuration for native traffic-shift # strategies (blue_green / linear / canary). The deploy manager # attaches these to UpdateService loadBalancers[].advancedConfiguration From aa746f50e2a1090a8ad30376aea46f4e48d44b24 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Mon, 15 Jun 2026 22:32:13 -0400 Subject: [PATCH 18/26] feat(ecs_service): configurable test-traffic selector for green listener rule Make the green (test) ALB listener rule's distinguishing condition configurable between an HTTP header and a query string via test_traffic_condition_type. Default to the query string __x-rvn-test__=1 so test traffic reaches the green target group without a custom header. ALB AND-combines conditions and ECS native blue/green wires exactly one test rule, so the selector is one type per service, not simultaneous header-OR-query matching. --- compute/ecs_service/README.md | 8 +++- compute/ecs_service/listener_rules.tf | 31 ++++++++++--- compute/ecs_service/tests/basic.tftest.hcl | 54 ++++++++++++++++++++++ compute/ecs_service/variables.tf | 25 +++++++++- 4 files changed, 109 insertions(+), 9 deletions(-) diff --git a/compute/ecs_service/README.md b/compute/ecs_service/README.md index 44f3a10..3ac8681 100644 --- a/compute/ecs_service/README.md +++ b/compute/ecs_service/README.md @@ -319,7 +319,13 @@ module "worker_service" { | desired_count | Desired number of tasks (0 for infrastructure-first) | `number` | `0` | no | | deployment_type | Create-time seed for the deployment strategy (rolling, blue_green, linear, canary); the strategy itself is set per deployment via UpdateService | `string` | `"rolling"` | no | | deployment_strategy_config | Initial bake/canary/linear tuning for native traffic-shift strategies (seed only — the deploy manager owns it per-deploy) | `object` | `{}` | no | -| test_listener_rule_arn | Optional ALB listener rule ARN for test traffic during blue/green validation | `string` | `null` | no | +| test_listener_rule_arn | Optional ALB listener rule ARN for test traffic during blue/green validation (takes precedence over green_alb_listener_rule_enabled) | `string` | `null` | no | +| green_alb_listener_rule_enabled | Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during native traffic-shift deployments, so the new revision can be validated before production traffic shifts. ALB-only; no effect for NLB services | `bool` | `true` | no | +| test_traffic_condition_type | Which request attribute distinguishes test traffic for the green rule: `header` (test_header_name/value) or `query-string` (test_query_string_key/value). One type per service — ALB AND-combines conditions and ECS wires exactly one test rule, so genuine "header OR query-string" matching is not possible natively | `string` | `"query-string"` | no | +| test_header_name | HTTP header name that routes test traffic to the green target group when test_traffic_condition_type is `header` | `string` | `"X-Ravion-Test"` | no | +| test_header_value | Value paired with test_header_name when test_traffic_condition_type is `header` | `string` | `"1"` | no | +| test_query_string_key | Query-string key that routes test traffic to the green target group when test_traffic_condition_type is `query-string` (e.g. `?__x-rvn-test__=1`) | `string` | `"__x-rvn-test__"` | no | +| test_query_string_value | Value paired with test_query_string_key when test_traffic_condition_type is `query-string` | `string` | `"1"` | no | | deployment_minimum_healthy_percent | Minimum healthy percent during deployment | `number` | `100` | no | | deployment_maximum_percent | Maximum percent during deployment | `number` | `200` | no | | execute_command_enabled | Enable ECS Exec for debugging | `bool` | `false` | no | diff --git a/compute/ecs_service/listener_rules.tf b/compute/ecs_service/listener_rules.tf index a5027fe..46187a5 100644 --- a/compute/ecs_service/listener_rules.tf +++ b/compute/ecs_service/listener_rules.tf @@ -156,13 +156,30 @@ resource "aws_lb_listener_rule" "test" { } } - # Distinguishing condition: only requests carrying the test header reach - # the green target group. Combined with the mirrored production - # conditions below, normal (header-less) traffic still matches production. - condition { - http_header { - http_header_name = var.test_header_name - values = [var.test_header_value] + # Distinguishing condition: only requests carrying the configured test + # selector reach the green target group. The selector is a header or a + # query string (test_traffic_condition_type) — ALB AND-combines all + # conditions on a rule and ECS native blue/green drives exactly one test + # rule, so it is one type per service, not both at once. Combined with the + # mirrored production conditions below, normal traffic still matches + # production. + dynamic "condition" { + for_each = var.test_traffic_condition_type == "header" ? [1] : [] + content { + http_header { + http_header_name = var.test_header_name + values = [var.test_header_value] + } + } + } + + dynamic "condition" { + for_each = var.test_traffic_condition_type == "query-string" ? [1] : [] + content { + query_string { + key = var.test_query_string_key + value = var.test_query_string_value + } } } diff --git a/compute/ecs_service/tests/basic.tftest.hcl b/compute/ecs_service/tests/basic.tftest.hcl index a821160..808313f 100644 --- a/compute/ecs_service/tests/basic.tftest.hcl +++ b/compute/ecs_service/tests/basic.tftest.hcl @@ -241,6 +241,60 @@ run "blue_green_deployment" { condition = length(aws_lb_listener_rule.alb["0"].action[0].forward) == 0 error_message = "Without target-group stickiness the rule should use a plain forward (no group-stickiness block)" } + + assert { + condition = length([for c in aws_lb_listener_rule.test[0].condition : c if length([for q in c.query_string : q if q.key == "__x-rvn-test__" && q.value == "1"]) > 0]) == 1 + error_message = "Test (green) rule should distinguish traffic by the __x-rvn-test__ query string by default" + } + + assert { + condition = length([for c in aws_lb_listener_rule.test[0].condition : c if length(c.http_header) > 0]) == 0 + error_message = "Default (query-string) selector should not emit an http_header condition on the test rule" + } +} + +################################################################################ +# Test: Header selector for the green test rule +# +# test_traffic_condition_type = "header" swaps the distinguishing test +# condition from the default query string to an HTTP header so requests +# carrying : reach the green target group. +################################################################################ + +run "green_rule_header_selector" { + command = plan + + variables { + deployment_type = "blue_green" + container_port = 8080 + test_traffic_condition_type = "header" + test_header_name = "X-Ravion-Test" + test_header_value = "1" + load_balancer_attachment = { + target_group = { + port = 8080 + protocol = "HTTP" + } + listener_rules = [{ + listener_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-alb/1234567890123456/1234567890123456" + priority = 100 + conditions = [{ + type = "host-header" + values = ["api.example.com"] + }] + }] + } + } + + assert { + condition = length([for c in aws_lb_listener_rule.test[0].condition : c if length(c.http_header) > 0 && c.http_header[0].http_header_name == "X-Ravion-Test"]) == 1 + error_message = "Test (green) rule should distinguish traffic by the X-Ravion-Test header when selector is header" + } + + assert { + condition = length([for c in aws_lb_listener_rule.test[0].condition : c if length(c.query_string) > 0]) == 0 + error_message = "Header selector should not emit a query-string condition on the test rule" + } } ################################################################################ diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index dc3b579..83144d1 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -344,7 +344,30 @@ variable "test_header_name" { variable "test_header_value" { type = string - description = "Value paired with test_header_name for routing test traffic to the green target group. Only used when green_alb_listener_rule_enabled is true." + description = "Value paired with test_header_name for routing test traffic to the green target group. Only used when green_alb_listener_rule_enabled is true and test_traffic_condition_type is \"header\"." + default = "1" +} + +variable "test_traffic_condition_type" { + type = string + description = "Which request attribute distinguishes test traffic for the green listener rule: \"header\" (matches test_header_name/test_header_value) or \"query-string\" (matches test_query_string_key/test_query_string_value). ALB AND-combines conditions within a single rule and ECS native blue/green wires exactly one test rule, so the selector is one type per service, not both at once. Only used when green_alb_listener_rule_enabled is true." + default = "query-string" + + validation { + condition = contains(["header", "query-string"], var.test_traffic_condition_type) + error_message = "test_traffic_condition_type must be either \"header\" or \"query-string\"." + } +} + +variable "test_query_string_key" { + type = string + description = "Query-string key that distinguishes test traffic for the green listener rule (e.g. \"__x-rvn-test__\" matches ?__x-rvn-test__=...). Requests carrying this key/value match the green rule and reach the alternate target group; requests without it fall through to production. Only used when green_alb_listener_rule_enabled is true and test_traffic_condition_type is \"query-string\"." + default = "__x-rvn-test__" +} + +variable "test_query_string_value" { + type = string + description = "Value paired with test_query_string_key for routing test traffic to the green target group. Only used when green_alb_listener_rule_enabled is true and test_traffic_condition_type is \"query-string\"." default = "1" } From 7178ade36a5d2b47c006df22485fe0f1a42ca65d Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 17 Jun 2026 10:38:58 -0400 Subject: [PATCH 19/26] add deployment options --- .../ecs_service/rvn-ecs-web-definition.yml | 151 ++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index 1665979..81d8184 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -200,6 +200,131 @@ module: show_when: build_type: nixpacks - $include: ../../partials/inputs/nixpacks-advanced.yml + - id: section_deployment + label: Deployment + type: section + - id: deployment_strategy + label: Deployment strategy + type: string + required: true + default: rolling + values: + - label: Rolling + value: rolling + - label: Blue/green + value: blue_green + - label: Linear + value: linear + - label: Canary + value: canary + - id: deployment_bake_time_in_minutes + label: Bake time in minutes + type: number + max: 1440 + min: 0 + show_when: + deployment_strategy: + - blue_green + - linear + - canary + default: 10 + - id: linear_step_percentage + label: Linear step percentage + type: number + max: 100 + min: 1 + show_when: + deployment_strategy: linear + default: 20 + - id: linear_step_bake_time_in_minutes + label: Linear step bake time in minutes + type: number + max: 1440 + min: 0 + show_when: + deployment_strategy: linear + default: 5 + - id: canary_percent + label: Canary percent + type: number + max: 100 + min: 1 + show_when: + deployment_strategy: canary + default: 5 + - id: canary_bake_time_in_minutes + label: Canary bake time in minutes + type: number + max: 1440 + min: 0 + show_when: + deployment_strategy: canary + default: 10 + - id: deployment_pause_stages + label: Manual approval gates + type: object_array + description: Optional gates that pause the deployment at chosen lifecycle stages until you approve it. Applies only to the blue/green, linear, and canary strategies. + collapsible: true + item_inputs: + - id: stage + label: Stage + type: string + description: Deployment lifecycle stage at which to pause and wait for manual approval. + default: POST_TEST_TRAFFIC_SHIFT + values: + - label: Reconcile service + description: Pause during service reconciliation, before ECS finalizes the deployment. + value: RECONCILE_SERVICE + - label: Pre scale up + description: Pause before the new task set scales up. + value: PRE_SCALE_UP + - label: Post scale up + description: Pause after the new task set has scaled up and become healthy. + value: POST_SCALE_UP + - label: Post test traffic shift + description: Pause after test traffic is shifted to the new (green) tasks, so you can validate them before production traffic moves. + value: POST_TEST_TRAFFIC_SHIFT + - label: Pre production traffic shift + description: Pause before production traffic begins shifting to the new tasks. + value: PRE_PRODUCTION_TRAFFIC_SHIFT + - label: Post production traffic shift + description: Pause after production traffic has fully shifted to the new tasks, before the old tasks are removed. + value: POST_PRODUCTION_TRAFFIC_SHIFT + - id: timeout_in_minutes + label: Timeout (minutes) + type: number + description: Minutes to wait for manual approval before the timeout action runs. Matches the AWS default of 1,440 minutes (24 hours). + max: 20160 + min: 1 + required: true + default: 1440 + - id: timeout_action + label: On timeout + type: string + description: Action to take if approval is not given before the timeout elapses. Defaults to rolling back. + required: true + default: ROLLBACK + values: + - label: Roll back + description: Roll the deployment back to the previous revision. + value: ROLLBACK + - label: Continue + description: Proceed with the deployment as if it were approved. + value: CONTINUE + item_label: Pause stage + item_title_field: stage + item_summary: + - timeout_in_minutes + - timeout_action + show_when: + deployment_strategy: + - blue_green + - linear + - canary + default: + - stage: POST_TEST_TRAFFIC_SHIFT + timeout_in_minutes: 1440 + timeout_action: ROLLBACK - id: section_health label: Health check type: section @@ -1433,6 +1558,32 @@ module: concurrency: queue_overflow: oldest queue_size: 1 + strategy: | + << + module.input.deployment_strategy == "rolling" ? nil : + module.input.deployment_strategy == "blue_green" ? { + "type": "blue_green", + "bake_time_in_minutes": module.input.deployment_bake_time_in_minutes, + "pause_stages": module.input.deployment_pause_stages + } : + module.input.deployment_strategy == "linear" ? { + "type": "linear", + "bake_time_in_minutes": module.input.deployment_bake_time_in_minutes, + "pause_stages": module.input.deployment_pause_stages, + "linear": { + "step_percentage": module.input.linear_step_percentage, + "step_bake_time_in_minutes": module.input.linear_step_bake_time_in_minutes + } + } : { + "type": "canary", + "bake_time_in_minutes": module.input.deployment_bake_time_in_minutes, + "pause_stages": module.input.deployment_pause_stages, + "canary": { + "canary_percent": module.input.canary_percent, + "canary_bake_time_in_minutes": module.input.canary_bake_time_in_minutes + } + } + >> infrastructure: ecs_cluster_arn: <> ecs_service_arn: <> From ac45547218aceca0a624a0bbfb1dc13bb0483b9a Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Wed, 17 Jun 2026 19:20:25 -0400 Subject: [PATCH 20/26] fix(ecs_service): avoid TG replacement deadlock on moved-block migration Re-adopting a pre-existing target group via the moved block changed the name from -tg to -tg-1, forcing replacement. With ignore_changes=[action] on the listener rule, the rule never repointed to the new TG and the old TG's destroy failed (in use by listener rule), breaking apply mid-way. Ignore changes to name so migrated TGs keep their existing name/ARN in place; fresh services still get the -tg-1 name. --- compute/ecs_service/target_groups.tf | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/compute/ecs_service/target_groups.tf b/compute/ecs_service/target_groups.tf index db15dfb..e0185e5 100644 --- a/compute/ecs_service/target_groups.tf +++ b/compute/ecs_service/target_groups.tf @@ -51,6 +51,13 @@ resource "aws_lb_target_group" "tg_1" { lifecycle { create_before_destroy = true + # Re-adopting a pre-existing target group via the moved block (old name + # suffix `-tg`) must not force replacement just because the configured + # name is now `-tg-1`: the listener rule ignores `action`, so it would + # never repoint to the replacement and the old TG's destroy would fail + # ("currently in use by a listener rule"). Ignoring `name` keeps the + # existing TG (and its ARN) in place; fresh services still get `-tg-1`. + ignore_changes = [name] } } From a649bf12a300ea3d4f7ec85a70a8d0d82f5cf965 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 18 Jun 2026 19:46:56 -0400 Subject: [PATCH 21/26] Fix ECS cluster capacity provider definition --- compute/ecs_cluster/rvn-ecs-cluster-definition.yml | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml index 0307c60..120e38b 100644 --- a/compute/ecs_cluster/rvn-ecs-cluster-definition.yml +++ b/compute/ecs_cluster/rvn-ecs-cluster-definition.yml @@ -4,7 +4,7 @@ definition: description: Production-ready AWS ECS cluster with Fargate, Fargate Spot, optional EC2 capacity, and shared load balancers. release: version: 0.2.0 - description: add default capacity provider selection; the cluster default strategy now commits to a single provider family + description: Automatically derive the default capacity provider so the cluster default strategy commits to a single provider family module: inputs: - id: network @@ -512,10 +512,6 @@ module: Select an **EC2 instance type** to enable EC2 capacity. Leave it blank for a Fargate-only cluster. - ### Default capacity provider - - Services that do not set their own capacity provider strategy use the cluster default. AWS does not allow mixing Fargate and EC2 capacity providers in that default, so it commits to a single one. Leave **Default capacity provider** blank to pick automatically (EC2 when enabled, then Fargate, then Fargate Spot), or choose one of the enabled providers explicitly. - ## Load balancers ### Public application load balancer @@ -567,7 +563,6 @@ module: | Fargate | No | `true` | Allow services to use Fargate capacity | | Fargate Spot | No | `true` | Allow services to use lower-cost interruptible Fargate Spot | | EC2 instance type | No | — | Enables EC2 capacity when selected | - | Default capacity provider | No | Automatic | Provider used by services without their own strategy | | Public load balancer | No | `false` | Create an internet-facing ALB | | Private load balancer | No | `false` | Create an internal ALB | | Public NLB | No | `false` | Create an internet-facing Network Load Balancer | @@ -619,7 +614,6 @@ module: base_path: compute/ecs_cluster terraform_variables: ...overrides: << module.input.advanced_terraform_variables >> - capacity_provider_default: << module.input.capacity_provider_default >> load_balancer_deletion_protection_enabled: << module.input.load_balancer_deletion_protection_enabled >> ec2_ami_id: << module.input.ec2_ami_id >> ec2_spot_enabled: << module.input.ec2_spot_enabled >> From e8965fa033703e9cad08c46d9f0c61dc239c84e7 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 18 Jun 2026 19:49:32 -0400 Subject: [PATCH 22/26] Default ECS web test traffic to query string --- compute/ecs_service/locals.tf | 5 ++--- compute/ecs_service/rvn-ecs-web-definition.yml | 18 +----------------- compute/ecs_service/variables.tf | 4 ++-- 3 files changed, 5 insertions(+), 22 deletions(-) diff --git a/compute/ecs_service/locals.tf b/compute/ecs_service/locals.tf index 0f39609..2afde47 100644 --- a/compute/ecs_service/locals.tf +++ b/compute/ecs_service/locals.tf @@ -55,10 +55,10 @@ locals { ) # When the green rule is enabled the module owns both priorities so the - # test rule (production conditions + the X-Ravion-Test header) is always + # test rule (production conditions + the configured test selector) is always # evaluated before the production rule — otherwise ALB, which routes by # priority order and not specificity, would match production first and a - # header-bearing request would never reach green. The production rule's + # test request would never reach green. The production rule's # priority becomes the base (its configured priority, else the default # below) and the test rule sits one slot ahead at base - 1. Both numbers # must be unique across all rules on a shared listener; set an explicit @@ -182,4 +182,3 @@ locals { # Service discovery settings enable_service_discovery = var.service_discovery != null } - diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index a7f2d11..05f3ad4 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -4,7 +4,7 @@ definition: description: Web server ECS service for running an HTTP application behind an ECS cluster load balancer. release: version: 0.6.0 - description: Share ECS service config and desired task count behavior, and add native ECS deployment strategies (rolling/blue_green/linear/canary) with a green test listener rule, production/alternate target group pair, and load-balancer advanced-config outputs + description: Share ECS service config and desired task count behavior, and add native ECS deployment strategies (rolling/blue_green/linear/canary) with a default query-string green test listener rule, production/alternate target group pair, and load-balancer advanced-config outputs module: inputs: - id: section_cluster @@ -167,20 +167,6 @@ module: - stage: POST_TEST_TRAFFIC_SHIFT timeout_in_minutes: 1440 timeout_action: ROLLBACK - - id: test_header_name - label: Test traffic header name - type: string - description: HTTP header that routes a request to the green (test) target group during a deployment. Send this header with the value below to reach the new revision before production traffic shifts. Requests without it keep hitting production. - collapsible: true - required: true - default: X-Ravion-Test - - id: test_header_value - label: Test traffic header value - type: string - description: Value the test traffic header must carry to be routed to the green (test) target group. - collapsible: true - required: true - default: "1" - id: section_health label: Health check type: section @@ -344,8 +330,6 @@ module: - ../../partials/stack/ecs-service-common-terraform-variables.yml - container_port: << module.input.container_port >> health_check_grace_period_seconds: << module.input.health_check_grace_period_seconds >> - test_header_name: << module.input.test_header_name >> - test_header_value: << module.input.test_header_value >> load_balancer_attachment: container_port: << module.input.container_port >> enabled: true diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index 0ff065b..3c52f6d 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -332,13 +332,13 @@ variable "test_listener_rule_arn" { variable "green_alb_listener_rule_enabled" { type = bool - description = "Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during native traffic-shift deployments (blue_green/linear/canary), so the new revision can be validated before production traffic shifts. The rule reuses the production listener and routing conditions plus a distinguishing test header (test_header_name/test_header_value) and forwards to the alternate target group; the ECS deployment controller rewrites it through the TEST_TRAFFIC_SHIFT lifecycle stages. Created by default; no effect for NLB services." + description = "Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during native traffic-shift deployments (blue_green/linear/canary), so the new revision can be validated before production traffic shifts. The rule reuses the production listener and routing conditions plus a distinguishing test selector (query string by default, or header when test_traffic_condition_type is \"header\") and forwards to the alternate target group; the ECS deployment controller rewrites it through the TEST_TRAFFIC_SHIFT lifecycle stages. Created by default; no effect for NLB services." default = true } variable "test_header_name" { type = string - description = "HTTP header name that distinguishes test traffic for the green listener rule. Requests carrying this header (with test_header_value) match the green rule and reach the alternate target group; requests without it fall through to production. Only used when green_alb_listener_rule_enabled is true." + description = "HTTP header name that distinguishes test traffic for the green listener rule. Requests carrying this header (with test_header_value) match the green rule and reach the alternate target group; requests without it fall through to production. Only used when green_alb_listener_rule_enabled is true and test_traffic_condition_type is \"header\"." default = "X-Ravion-Test" } From 9934440178f93eb5d7a3ae114acc65e1b84d4a64 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Thu, 18 Jun 2026 19:57:41 -0400 Subject: [PATCH 23/26] Add ECS deployment input descriptions --- compute/ecs_service/rvn-ecs-web-definition.yml | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index 05f3ad4..a503197 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -48,20 +48,26 @@ module: - id: deployment_strategy label: Deployment strategy type: string + description: Choose how traffic moves from the current deployment to the new deployment. required: true default: rolling values: - label: Rolling + description: Replace tasks in place while the service keeps serving from the production target group. value: rolling - label: Blue/green + description: Start the new deployment on the alternate target group, then shift production traffic after test validation. value: blue_green - label: Linear + description: Shift production traffic to the new deployment in equal percentage steps with a wait between steps. value: linear - label: Canary + description: Send a small percentage of production traffic to the new deployment first, then shift the rest after the canary bake time. value: canary - id: deployment_bake_time_in_minutes label: Bake time in minutes type: number + description: Minutes to keep the current and new deployments running after production traffic has fully shifted, before the old deployment is removed. max: 1440 min: 0 show_when: @@ -73,6 +79,7 @@ module: - id: linear_step_percentage label: Linear step percentage type: number + description: Percentage of production traffic to move to the new deployment at each linear step. max: 100 min: 1 show_when: @@ -81,6 +88,7 @@ module: - id: linear_step_bake_time_in_minutes label: Linear step bake time in minutes type: number + description: Minutes to wait between linear traffic steps before shifting the next percentage. max: 1440 min: 0 show_when: @@ -89,6 +97,7 @@ module: - id: canary_percent label: Canary percent type: number + description: Percentage of production traffic to send to the new deployment during the canary phase. max: 100 min: 1 show_when: @@ -97,6 +106,7 @@ module: - id: canary_bake_time_in_minutes label: Canary bake time in minutes type: number + description: Minutes to hold canary traffic before shifting the remaining production traffic to the new deployment. max: 1440 min: 0 show_when: From 86eb530811ff19cfae6173e9c13602e501a53def Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Fri, 19 Jun 2026 10:59:21 -0400 Subject: [PATCH 24/26] Clarify ECS traffic-shift deployment docs --- compute/ecs_service/README.md | 59 +++++++++++++++++-- compute/ecs_service/outputs.tf | 3 +- .../ecs_service/rvn-ecs-web-definition.yml | 43 +++++++++++++- compute/ecs_service/variables.tf | 17 +++--- 4 files changed, 103 insertions(+), 19 deletions(-) diff --git a/compute/ecs_service/README.md b/compute/ecs_service/README.md index b2a77ca..b9ca61c 100644 --- a/compute/ecs_service/README.md +++ b/compute/ecs_service/README.md @@ -4,7 +4,7 @@ This module creates an Amazon ECS service with a placeholder task definition, lo **Note:** This module provisions infrastructure with a placeholder container (hello-world). The Flightcontrol deploy manager deploys the actual application by registering task definitions and calling UpdateService with the authoritative `deploymentConfiguration` (strategy, bake times, pause lifecycle hooks) on every deploy. -When a load balancer is attached, the module always provisions the production + alternate target-group pair, the ECS infrastructure role, and the service's `load_balancer.advanced_configuration` — so the deployment strategy is a **per-deployment decision**: any service can switch between rolling / blue_green / linear / canary on a single deploy with no Terraform changes. `deployment_type` only seeds the strategy at create time. +When a load balancer is attached, the module always provisions the production + alternate target-group pair, the ECS infrastructure role, and the service's `load_balancer.advanced_configuration` — so the deployment strategy is a **per-deployment decision**: eligible services can switch between rolling / blue_green / linear / canary on a single deploy with no Terraform changes. ALB traffic-shift deployments require a single production listener rule. `deployment_type` only seeds the strategy at create time. ## Features @@ -256,7 +256,7 @@ module "worker_service" { | Name | Version | |------|---------| | opentofu/terraform | >= 1.10.0 | -| aws | >= 5.0 | +| aws | >= 6.21 | ## Inputs @@ -317,9 +317,9 @@ module "worker_service" { | Name | Description | Type | Default | Required | |------|-------------|------|---------|----------| | desired_count | Desired number of tasks (0 for infrastructure-first) | `number` | `0` | no | -| deployment_type | Create-time seed for the deployment strategy (rolling, blue_green, linear, canary); the strategy itself is set per deployment via UpdateService | `string` | `"rolling"` | no | -| deployment_strategy_config | Initial bake/canary/linear tuning for native traffic-shift strategies (seed only — the deploy manager owns it per-deploy) | `object` | `{}` | no | -| test_listener_rule_arn | Optional ALB listener rule ARN for test traffic during blue/green validation (takes precedence over green_alb_listener_rule_enabled) | `string` | `null` | no | +| deployment_type | Initial deployment strategy for direct Terraform use; Ravion stacks use rolling and set blue_green/linear/canary per deploy via UpdateService | `string` | `"rolling"` | no | +| deployment_strategy_config | Initial bake/canary/linear tuning for direct Terraform use; Ravion stacks set this per deploy through the deploy manager | `object` | `{}` | no | +| test_listener_rule_arn | Optional ALB listener rule ARN for test traffic during blue/green validation when the module-created green listener rule is not enabled | `string` | `null` | no | | green_alb_listener_rule_enabled | Create a dedicated ALB listener rule that routes test traffic to the green (alternate) target group during native traffic-shift deployments, so the new revision can be validated before production traffic shifts. ALB-only; no effect for NLB services | `bool` | `true` | no | | test_traffic_condition_type | Which request attribute distinguishes test traffic for the green rule: `header` (test_header_name/value) or `query-string` (test_query_string_key/value). One type per service — ALB AND-combines conditions and ECS wires exactly one test rule, so genuine "header OR query-string" matching is not possible natively | `string` | `"query-string"` | no | | test_header_name | HTTP header name that routes test traffic to the green target group when test_traffic_condition_type is `header` | `string` | `"X-Ravion-Test"` | no | @@ -836,7 +836,7 @@ The placeholder container prints a message and exits, so load balancer health ch All four strategies run on the native ECS deployment controller — no CodeDeploy and no external controller. -The same infrastructure (2 target groups + infrastructure role) backs every load-balanced service, so any service can switch strategy on its next deployment. +The same infrastructure (2 target groups + infrastructure role) backs every load-balanced service, so eligible services can switch strategy on their next deployment. ALB traffic-shift deployments require a single production listener rule. | Feature | Rolling | Blue/Green | Linear | Canary | |---------|---------|------------|--------|--------| @@ -851,6 +851,53 @@ The same infrastructure (2 target groups + infrastructure role) backs every load **Use linear/canary when:** you want production traffic to shift gradually with monitoring between steps. +### How do I access the standby service during a traffic-shift deployment? + +For ALB-backed blue_green, linear, and canary deployments, the module creates a test listener rule that routes matching requests to the standby, or green, task set on the alternate target group. The request must match the same host/path conditions as the production listener rule and include the test selector. + +By default, the selector is the query parameter `__x-rvn-test__=1`: + +```bash +curl "https://api.example.com/health?__x-rvn-test__=1" +``` + +The alternate target group only has registered targets while ECS is running a traffic-shift deployment. Outside that window, the standby route may have no healthy targets. + +To override the query parameter in Terraform: + +```hcl +test_query_string_key = "preview" +test_query_string_value = "green" +``` + +Then request `?preview=green`. + +To use an HTTP header instead of a query parameter: + +```hcl +test_traffic_condition_type = "header" +test_header_name = "X-Ravion-Test" +test_header_value = "1" +``` + +Then send the header with the request: + +```bash +curl -H "X-Ravion-Test: 1" "https://api.example.com/health" +``` + +When using the Ravion ECS Web Server module definition, set the same lower-level variables through Advanced Terraform variables. For example, to use a header selector: + +```json +{ + "test_traffic_condition_type": "header", + "test_header_name": "X-Ravion-Test", + "test_header_value": "1" +} +``` + +The ALB rule can use one selector type per service: either `query-string` or `header`. + ### How do I use this module with an NLB instead of an ALB? For NLB, configure the `nlb_listener` instead of `listener_rules`: diff --git a/compute/ecs_service/outputs.tf b/compute/ecs_service/outputs.tf index 13b47ae..3c397ce 100644 --- a/compute/ecs_service/outputs.tf +++ b/compute/ecs_service/outputs.tf @@ -167,7 +167,7 @@ output "production_listener_rule_arn" { } output "test_listener_rule_arn" { - description = "ARN of the test listener rule the ECS deployment controller rewrites during the TEST_TRAFFIC_SHIFT lifecycle stages, routing test traffic to the green revision before the production cutover. The deploy manager passes it as advanced_configuration.test_listener_rule on UpdateService. Null when no test listener rule is configured (the common case)." + description = "ARN of the test listener rule the ECS deployment controller rewrites during the TEST_TRAFFIC_SHIFT lifecycle stages, routing test traffic to the green revision before the production cutover. The deploy manager passes it as advanced_configuration.test_listener_rule on UpdateService. Null when no module-created or externally-managed test listener rule is configured." value = local.test_listener_rule_arn } @@ -266,4 +266,3 @@ output "region" { description = "The AWS region where the resources are deployed." value = local.region } - diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index a503197..f305a7f 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -4,7 +4,7 @@ definition: description: Web server ECS service for running an HTTP application behind an ECS cluster load balancer. release: version: 0.6.0 - description: Share ECS service config and desired task count behavior, and add native ECS deployment strategies (rolling/blue_green/linear/canary) with a default query-string green test listener rule, production/alternate target group pair, and load-balancer advanced-config outputs + description: Add native ECS blue/green, linear, and canary deployments with per-deploy strategy controls, manual approval gates, standby validation traffic, production/alternate target groups, ECS infrastructure role, and load-balancer advanced configuration outputs. module: inputs: - id: section_cluster @@ -673,7 +673,7 @@ module: ## Deployment - Deployment type defaults to Rolling. Blue/green is available when you need separate target groups for CodeDeploy-style traffic shifting. + Deployment strategy defaults to Rolling. Blue/green, Linear, and Canary use the native ECS traffic-shift controller with production and alternate target groups. The deployment strategy can change between deploys without changing Terraform infrastructure. Deployments update the ECS service task definition with the selected image, generated container definition, awslogs configuration, runtime platform, environment variables, secrets, and capacity-provider-compatible task settings. @@ -691,6 +691,43 @@ module: Hook-specific environment variables are appended to the app container override for the one-off task. Optional hook CPU, memory, ephemeral storage, and timeout settings let release tasks use different resources from the web service without changing the steady-state app task. + ## Standby validation traffic + + For Blue/green, Linear, and Canary deployments, Ravion creates a test listener rule on the same Application Load Balancer listener as the production rule. During the test traffic stage, requests that match the service's normal Domain host rules and Path rules and include the standby selector route to the standby, or green, task set on the alternate target group. + + By default, the standby selector is the query parameter `__x-rvn-test__=1`. For example, if production traffic uses `https://app.example.com/health`, validate the standby service with: + + ```text + https://app.example.com/health?__x-rvn-test__=1 + ``` + + The alternate target group only has registered targets while ECS is running a traffic-shift deployment. Outside that window, the standby route may have no healthy targets. + + Use Advanced Terraform variables to override the standby selector. Values in Advanced Terraform variables override the generated Terraform variables for the service. + + To use a different query parameter: + + ```json + { + "test_query_string_key": "preview", + "test_query_string_value": "green" + } + ``` + + Then validate standby traffic with `?preview=green`. + + To use an HTTP header instead of a query parameter: + + ```json + { + "test_traffic_condition_type": "header", + "test_header_name": "X-Ravion-Test", + "test_header_value": "1" + } + ``` + + Then send requests with `X-Ravion-Test: 1`. A service can use either the query-string selector or the header selector, not both at once. + ## Builder settings Builder settings apply to Nixpacks and Dockerfile builds. @@ -744,7 +781,7 @@ module: | Post-deploy timeout (secs) | No | 1800 | Maximum post-deploy task wait time | | Autoscaling | No | true | Enable CPU and optional memory target tracking | | Desired tasks | Yes* | 1 | Number of web tasks when autoscaling is disabled | - | Deployment type | No | rolling | Rolling or blue/green deployment infrastructure | + | Deployment strategy | Yes | rolling | Rolling, Blue/green, Linear, or Canary | | Tags | No | Standard Ravion tags | Additional tags applied to resources | | Advanced Terraform variables | No | {} | Raw lower-level overrides for exceptional cases | | OpenTofu version override | No | Ravion default | Override the OpenTofu version for the stack | diff --git a/compute/ecs_service/variables.tf b/compute/ecs_service/variables.tf index 3c52f6d..015f848 100644 --- a/compute/ecs_service/variables.tf +++ b/compute/ecs_service/variables.tf @@ -286,7 +286,7 @@ variable "desired_count" { variable "deployment_type" { type = string - description = "The deployment strategy ('rolling', 'blue_green', 'linear', 'canary') used to seed the service's deployment_configuration at create time. The strategy is a per-deployment setting on the native ECS controller — the Flightcontrol deploy manager passes the authoritative strategy on every UpdateService call, so it can change between deployments without Terraform changes." + description = "Initial deployment strategy for direct Terraform use ('rolling', 'blue_green', 'linear', 'canary'). Ravion ECS Web stack provisioning passes 'rolling' and the Flightcontrol deploy manager passes the authoritative blue_green/linear/canary strategy on each UpdateService call, so strategy changes in Ravion do not require Terraform changes." default = "rolling" validation { @@ -314,19 +314,20 @@ variable "deployment_strategy_config" { }), {}) }) description = <<-EOT - Initial tuning for the native traffic-shift strategies (blue_green / - linear / canary). This only seeds the service at create time — the - Flightcontrol deploy manager passes the authoritative - deploymentConfiguration (including pause lifecycle hooks) on every - UpdateService call, so post-create changes to these values are - ignored by Terraform (see ignore_changes on aws_ecs_service.this). + Initial tuning for direct Terraform use with native traffic-shift + strategies (blue_green / linear / canary). Ravion ECS Web stack + provisioning uses rolling and the Flightcontrol deploy manager passes + the authoritative deploymentConfiguration (including pause lifecycle + hooks) on every UpdateService call, so post-create changes to these + values are ignored by Terraform (see ignore_changes on + aws_ecs_service.this). EOT default = {} } variable "test_listener_rule_arn" { type = string - description = "Optional ARN of an externally-managed ALB listener rule that routes test traffic for blue/green validation (drives the TEST_TRAFFIC_SHIFT lifecycle stages). Only used for native traffic-shift strategies. Takes precedence over green_alb_listener_rule_enabled." + description = "Optional ARN of an externally-managed ALB listener rule that routes test traffic for blue/green validation (drives the TEST_TRAFFIC_SHIFT lifecycle stages). Only used for native traffic-shift strategies when the module-created green listener rule is not enabled." default = null } From 24a5e1dbd0ddd1f47ab7505bfccc6b9ac8c34f1e Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Fri, 19 Jun 2026 11:43:01 -0400 Subject: [PATCH 25/26] Document ALB group stickiness for ECS web service --- compute/ecs_service/rvn-ecs-web-definition.yml | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index f305a7f..deb94be 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -4,7 +4,7 @@ definition: description: Web server ECS service for running an HTTP application behind an ECS cluster load balancer. release: version: 0.6.0 - description: Add native ECS blue/green, linear, and canary deployments with per-deploy strategy controls, manual approval gates, standby validation traffic, production/alternate target groups, ECS infrastructure role, and load-balancer advanced configuration outputs. + description: Add native ECS blue/green, linear, and canary deployments with per-deploy strategy controls, manual approval gates, standby validation traffic, production/alternate target groups, ALB group stickiness guidance, ECS infrastructure role, and load-balancer advanced configuration outputs. module: inputs: - id: section_cluster @@ -253,7 +253,7 @@ module: - id: target_group_stickiness_enabled label: Sticky sessions type: boolean - description: Enable load balancer cookie stickiness so repeat requests are routed to the same task when possible. + description: Enable load balancer cookie stickiness so repeat requests are routed to the same task when possible. When enabled, traffic-shift deployments also keep clients on the first production or alternate target group they reach. collapsible: true default: true - id: target_group_stickiness_type @@ -728,6 +728,16 @@ module: Then send requests with `X-Ravion-Test: 1`. A service can use either the query-string selector or the header selector, not both at once. + ## Sticky sessions and traffic shifts + + The module definition sets Sticky sessions to true by default. When that setting is enabled, Ravion configures target-group stickiness for the service's production and alternate target groups, so the load balancer keeps repeat requests on the same task when possible. With the default Load balancer cookie stickiness type, the target-level cookie is `AWSALB`; AWS may also set `AWSALBCORS` for CORS support. With Application cookie stickiness, the target-level cookie is the configured Application cookie name, and AWS may also set `AWSALBAPP-*` cookies. + + During Blue/green, Linear, and Canary deployments, Ravion also enables ALB group stickiness on the production and standby listener rule forward actions when Sticky sessions is enabled. A client that first reaches the production target group or the alternate target group keeps using that same group for the stickiness cookie duration, even while the deployment's weighted traffic shift changes for new clients. + + The ALB group stickiness cookie is `AWSALBTG`. For CORS requests, AWS may also set `AWSALBTGCORS`. Application cookie name applies only to `app_cookie` stickiness inside the selected target group; the ALB group stickiness cookie name is managed by AWS and cannot be changed. + + To clear stickiness for a browser, open the browser developer tools, go to Application or Storage > Cookies for the service domain, and delete the relevant cookies: `AWSALBTG` and `AWSALBTGCORS` for ALB group stickiness; `AWSALB` and `AWSALBCORS` for the default target-level stickiness; and the configured Application cookie name plus any `AWSALBAPP-*` cookies when Application cookie stickiness is selected. For API clients, remove those cookie names from the cookie jar or stop sending them in the `Cookie` header. The next request can then enter the current traffic split like a new client. If Sticky sessions is turned off, Ravion does not enable target-group stickiness or ALB group stickiness for this behavior. + ## Builder settings Builder settings apply to Nixpacks and Dockerfile builds. @@ -757,6 +767,7 @@ module: | Start command | No | [] | Command arguments that override a prebuilt image default CMD | | Container port | Yes | 80 | Port exposed by the app container | | Health check path | Yes | / | HTTP path used by the target group health check | + | Sticky sessions | No | true | Keep clients on the same task, and on the same traffic-shift target group when enabled | | Domain host rules | No | - | Hostnames such as app.example.com or *.example.com | | Path rules | No | - | Path patterns such as /*, /api/*, or /app/* | | Capacity provider | Yes | fargate | Primary service capacity provider | From 3b7f51d8624381bbd32011bcb9916bfbf7d6eba6 Mon Sep 17 00:00:00 2001 From: Mina Abadir Date: Fri, 19 Jun 2026 11:52:58 -0400 Subject: [PATCH 26/26] update to use item_description --- compute/ecs_service/rvn-ecs-web-definition.yml | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/compute/ecs_service/rvn-ecs-web-definition.yml b/compute/ecs_service/rvn-ecs-web-definition.yml index deb94be..483dc09 100644 --- a/compute/ecs_service/rvn-ecs-web-definition.yml +++ b/compute/ecs_service/rvn-ecs-web-definition.yml @@ -165,9 +165,7 @@ module: value: CONTINUE item_label: Pause stage item_title_field: stage - item_summary: - - timeout_in_minutes - - timeout_action + item_description: "Timeout: {timeout_in_minutes} minutes / {timeout_action}" show_when: deployment_strategy: - blue_green