feat(ecs_service): native ECS deployment strategies (rolling/blue_green/linear/canary) by mabadir · Pull Request #5 · flightcontrolhq/modules

mabadir · 2026-06-04T16:16:48Z

Summary

Replaces the dual-service / CodeDeploy-era blue/green design with the native ECS deployment controller strategies: rolling, blue_green, linear, and canary — and makes the strategy a per-deployment decision that requires no Terraform changes to switch.

Native strategies (`b4ee71a`)

deployment_controller.type = "ECS" always — the CodeDeploy/EXTERNAL mapping is removed
New deployment_type values linear / canary alongside rolling / blue_green
New deployment_strategy_config variable (bake time, canary %, linear step %) that seeds deployment_configuration at create time
New test_listener_rule_arn variable for blue/green test-traffic validation (TEST_TRAFFIC_SHIFT stages)
ECS infrastructure IAM role with AmazonECSInfrastructureRolePolicyForLoadBalancers — ECS assumes it to rewrite listener rules and (de)register targets during traffic shifts
load_balancer.advanced_configuration wiring (production/alternate target group, production + optional test listener rule, role)
deployment_configuration added to ignore_changes: the Flightcontrol deploy manager passes the authoritative configuration (strategy, bake times, pause lifecycle hooks) on every UpdateService call, so Terraform must not fight it
Requires AWS provider >= 6.21 (linear_configuration / canary_configuration)

Strategy is per-deployment (`3c72a84`)

Whenever a load balancer is attached, the module now always provisions:

the production (tg-1) + alternate (tg-2) target-group pair (the rolling-only aws_lb_target_group.this is removed)
the ECS infrastructure role
the service's advanced_configuration

Rolling deployments serve from the production target group only and never touch the alternate, so any service can switch between all four strategies on a single UpdateService call. deployment_type is purely the create-time seed.

Trade-off: every load-balanced rolling service carries one idle target group and one unused IAM role.

Outputs

production_target_group_arn / alternate_target_group_arn (+ names), ecs_infrastructure_role_arn
target_group_arn retained as an alias of the production target group
Removed: blue/green dual-service outputs

Tests

Fixed the module test suite so it actually runs: basic_service was failing on main (mock provider emits invalid JSON policies / ARNs), which skipped every other run. Added mock_data / mock_resource defaults and fixed two stale assertions.

tofu validate ✅ — tofu test ✅ 11/11 passed

Breaking changes

deployment_controller is always ECS; services created with the old CodeDeploy blue/green mapping are not migrated (pre-GA, clean removal)
aws_lb_target_group.this removed — existing load-balanced services will see target-group recreation on upgrade
Blue/green dual-service variables/outputs removed

🤖 Generated with Claude Code

Greptile Summary

This PR replaces the CodeDeploy-era blue/green dual-service design with native ECS deployment controller strategies (rolling, blue_green, linear, canary), and restructures the ECS cluster's capacity-provider default strategy to comply with the AWS restriction on mixing Fargate and EC2 families.

ecs_service module: Always uses deployment_controller.type = \"ECS\"; always creates a production + alternate target-group pair and an ECS infrastructure IAM role when a load balancer is attached, so any service can switch between all four strategies on a single UpdateService call with no Terraform changes; seeds deployment_configuration at create time and places it in ignore_changes so the Flightcontrol deploy manager owns it per-deployment.
ecs_cluster module: New capacity_provider_default variable commits the cluster's default strategy to a single provider family (EC2, Fargate, or Fargate Spot), preventing the AWS API rejection that previously occurred when both Fargate and EC2 providers were added to one strategy.
Tests: Unit tests fixed (mock provider now emits valid JSON/ARNs), substantially expanded with new cases for each strategy, multi-rule rejection, stickiness handling, and header vs. query-string test selectors; a new Go integration test validates AWS API acceptance of advanced_configuration on a rolling service.

Confidence Score: 4/5

Safe to merge after fixing two logic bugs: the test_listener_rule_arn variable is silently overridden by the module-created test rule when both are provided, and the YAML definition wires header-name/value UI inputs that can never take effect because test_traffic_condition_type is not wired.

The core redesign is coherent and well-tested. Two defects stand out: (1) in locals.tf, local.green_alb_listener_rule_enabled takes precedence over var.test_listener_rule_arn, opposite to what the variable description promises, so callers supplying an externally-managed test rule ARN while leaving green_alb_listener_rule_enabled = true (the default) will have their ARN silently replaced by a newly-created rule; (2) in rvn-ecs-web-definition.yml, test_traffic_condition_type is never passed to Terraform, so the Test traffic header name and Test traffic header value fields in the UI are dead — the test rule always matches the x-rvn-test query string regardless of what the operator enters.

compute/ecs_service/locals.tf (test_listener_rule_arn precedence) and compute/ecs_service/rvn-ecs-web-definition.yml (missing test_traffic_condition_type wiring)

Important Files Changed

Filename	Overview
compute/ecs_service/ecs_service.tf	Core ECS service rewrite: replaces CodeDeploy/rolling dual-path with always-ECS controller; adds deployment_configuration seeding, advanced_configuration wiring, preconditions for listener rule count — well-structured with key edge cases guarded
compute/ecs_service/locals.tf	New locals for native strategy, test-rule priority, stickiness detection; test_listener_rule_arn precedence logic is inverted relative to the documented behavior
compute/ecs_service/listener_rules.tf	Adds the test (green) listener rule with mirrored production conditions plus the test-traffic selector; handles group stickiness correctly
compute/ecs_service/iam_infrastructure.tf	New ECS infrastructure role with AmazonECSInfrastructureRolePolicyForLoadBalancers; scoped trust policy to ecs.amazonaws.com; created for any LB-attached service
compute/ecs_service/target_groups.tf	Removes rolling-only aws_lb_target_group.this; tg_1/tg_2 are always created when LB is attached; ignore_changes=[name] guards the moved-block upgrade path
compute/ecs_service/variables.tf	Adds deployment_strategy_config, test_listener_rule_arn, green_alb_listener_rule_enabled, test traffic selector variables; documented precedence of test_listener_rule_arn contradicts the implementation in locals.tf
compute/ecs_service/rvn-ecs-web-definition.yml	Adds full deployment strategy UI section; wires test_header_name/test_header_value but omits test_traffic_condition_type, leaving the module at its query-string default regardless of the UI header inputs
compute/ecs_service/moved.tf	State migration from aws_lb_target_group.this[0] to tg_1[0]; combined with ignore_changes=[name] on tg_1 to avoid forced replacement and listener-rule deadlock on upgrade
compute/ecs_service/outputs.tf	Renames blue/green outputs to production/alternate; retains target_group_arn and target_group_name as aliases; adds ecs_infrastructure_role_arn, production_listener_rule_arn, test_listener_rule_arn
compute/ecs_service/tests/basic.tftest.hcl	Substantial expansion — adds mock_resource defaults for ARN validation, new test cases for canary/linear/blue_green, header vs query-string selector, stickiness, and multi-rule rejection
compute/ecs_cluster/locals.tf	Refactors capacity_provider_strategy to commit to a single family (EC2, fargate, or fargate_spot) to satisfy AWS restriction; controlled by new capacity_provider_default variable
test/ecs_service_test.go	Adds integration assertions for advanced_configuration on a real rolling service (alternate TG, production and test listener rule ARNs), directly validating the AWS API acceptance assumption

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    LB{Load balancer attached?}
    LB -- No --> NoTG[No target groups / No infra role]
    LB -- Yes --> TG1[aws_lb_target_group.tg_1 production]
    LB -- Yes --> TG2[aws_lb_target_group.tg_2 alternate]
    LB -- Yes --> ROLE[aws_iam_role.ecs_infrastructure]
    TG1 & TG2 & ROLE --> ADV[load_balancer.advanced_configuration]
    ADV --> SVC[aws_ecs_service deployment_controller=ECS]
    DTYPE{deployment_type at create time}
    DTYPE -- rolling --> DCFG_SKIP[No deployment_configuration block]
    DTYPE -- blue_green/linear/canary --> DCFG[deployment_configuration strategy+bake+canary/linear]
    DCFG_SKIP & DCFG --> SVC
    SVC --> IGN[ignore_changes: desired_count task_definition load_balancer deployment_configuration]
    TESTQ{green_alb_listener_rule_enabled AND test_listener_rule_arn==null?}
    TESTQ -- Yes --> TESTRULE[aws_lb_listener_rule.test]
    TESTQ -- No --> EXTARN[var.test_listener_rule_arn or null]
    TESTRULE & EXTARN --> ADV

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    LB{Load balancer attached?}
    LB -- No --> NoTG[No target groups / No infra role]
    LB -- Yes --> TG1[aws_lb_target_group.tg_1 production]
    LB -- Yes --> TG2[aws_lb_target_group.tg_2 alternate]
    LB -- Yes --> ROLE[aws_iam_role.ecs_infrastructure]
    TG1 & TG2 & ROLE --> ADV[load_balancer.advanced_configuration]
    ADV --> SVC[aws_ecs_service deployment_controller=ECS]
    DTYPE{deployment_type at create time}
    DTYPE -- rolling --> DCFG_SKIP[No deployment_configuration block]
    DTYPE -- blue_green/linear/canary --> DCFG[deployment_configuration strategy+bake+canary/linear]
    DCFG_SKIP & DCFG --> SVC
    SVC --> IGN[ignore_changes: desired_count task_definition load_balancer deployment_configuration]
    TESTQ{green_alb_listener_rule_enabled AND test_listener_rule_arn==null?}
    TESTQ -- Yes --> TESTRULE[aws_lb_listener_rule.test]
    TESTQ -- No --> EXTARN[var.test_listener_rule_arn or null]
    TESTRULE & EXTARN --> ADV

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
compute/ecs_service/locals.tf:50-55
**`test_listener_rule_arn` silently ignored when `green_alb_listener_rule_enabled = true`**

The variable description (and the README) state that `test_listener_rule_arn` "takes precedence over `green_alb_listener_rule_enabled`", but the code is the other way around. When a caller provides an externally-managed test rule ARN while leaving the default `green_alb_listener_rule_enabled = true`, `local.green_alb_listener_rule_enabled` evaluates to `true`, the module creates a redundant second test listener rule, and the caller-supplied ARN is silently discarded — the wrong rule ends up wired into `advanced_configuration.test_listener_rule`.

```suggestion
  green_alb_listener_rule_enabled = (
    local.enable_load_balancer
    && !local.enable_nlb_listener
    && var.green_alb_listener_rule_enabled
    && length(var.load_balancer_attachment.listener_rules) > 0
    && var.test_listener_rule_arn == null
  )
```

### Issue 2 of 2
compute/ecs_service/rvn-ecs-web-definition.yml:344-349
**UI "Test traffic header name/value" inputs are silently ignored — query-string routing is used instead**

The YAML wires `test_header_name` and `test_header_value` as Terraform variables, but `test_traffic_condition_type` is never set. The Terraform module defaults to `"query-string"`, so the test listener rule always uses `__x-rvn-test__=1` as the selector regardless of what the operator enters in the header fields. The header inputs in the UI are dead configuration. Either add `test_traffic_condition_type: "header"` alongside the header variable mappings, or remove the header name/value UI inputs and document that test traffic is identified by the `__x-rvn-test__` query string.

_{Reviews (4): Last reviewed commit: "Merge branch 'main' into native-ecs-depl..." | Re-trigger Greptile}

…en/linear/canary) Replace the CodeDeploy/external-controller blue/green wiring with the ECS deployment controller's built-in strategies: - deployment_type now accepts rolling | blue_green | linear | canary; the deployment controller is always ECS (CODE_DEPLOY mapping removed) - new deployment_strategy_config seeds bake time + canary/linear tuning at create time; deployment_configuration is in ignore_changes because the Flightcontrol deploy manager passes the authoritative config (including pause lifecycle hooks) on every UpdateService call - native strategies wire load_balancer.advanced_configuration: alternate target group (tg-2), production listener rule (first ALB rule or the NLB listener), optional test_listener_rule_arn, and a new ECS infrastructure role (AmazonECSInfrastructureRolePolicyForLoadBalancers) - target groups tg-1/tg-2 are now production/alternate and gate on any native traffic-shift strategy, not just blue_green; outputs renamed accordingly (production_/alternate_target_group_*) and ecs_infrastructure_role_arn added - provider floor bumped to aws >= 6.21 (linear/canary configuration) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Always provision the production/alternate target-group pair, the ECS infrastructure role, and the service's load_balancer advanced_configuration whenever a load balancer is attached — not just for native traffic-shift deployment_types. Rolling deployments serve from the production target group only, so any service can now switch between rolling / blue_green / linear / canary on a single UpdateService call with zero Terraform changes; deployment_type is purely the create-time seed. Also fixes the module test suite: mock the iam_policy_document / partition / region / vpc data sources and computed ARNs so all 11 runs execute (basic_service was failing on main and skipping the rest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-04T16:23:02Z

+        production_listener_rule = (
+          local.enable_nlb_listener
+          ? aws_lb_listener.nlb[0].arn
+          : aws_lb_listener_rule.alb["0"].arn
+        )


Only the first ALB listener rule participates in traffic shifting

production_listener_rule is hardcoded to alb["0"].arn. The AWS ECS advanced_configuration API only accepts a single production listener rule ARN, so during blue_green/linear/canary deployments ECS rewrites only this one rule. Any additional listener rules (alb["1"], alb["2"], …) continue forwarding to tg_1 for the entire deployment — traffic on those rules never shifts to the new revision. Consider adding a validation block that requires length(var.load_balancer_attachment.listener_rules) <= 1 when using a native traffic-shift strategy, or document this constraint prominently so callers with multi-rule services aren't caught off guard.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/ecs_service.tf Line: 102-106 Comment: **Only the first ALB listener rule participates in traffic shifting** `production_listener_rule` is hardcoded to `alb["0"].arn`. The AWS ECS `advanced_configuration` API only accepts a single production listener rule ARN, so during blue_green/linear/canary deployments ECS rewrites only this one rule. Any additional listener rules (`alb["1"]`, `alb["2"]`, …) continue forwarding to tg_1 for the entire deployment — traffic on those rules never shifts to the new revision. Consider adding a validation block that requires `length(var.load_balancer_attachment.listener_rules) <= 1` when using a native traffic-shift strategy, or document this constraint prominently so callers with multi-rule services aren't caught off guard. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-04T16:23:02Z

  dynamic "load_balancer" {
-    for_each = local.enable_load_balancer && var.deployment_type == "blue_green" ? [1] : []
+    for_each = local.enable_load_balancer ? [1] : []
    content {
      target_group_arn = aws_lb_target_group.tg_1[0].arn
      container_name   = local.lb_container_name
      container_port   = local.lb_container_port
+
+      advanced_configuration {
+        alternate_target_group_arn = aws_lb_target_group.tg_2[0].arn
+        production_listener_rule = (
+          local.enable_nlb_listener
+          ? aws_lb_listener.nlb[0].arn
+          : aws_lb_listener_rule.alb["0"].arn
+        )
+        test_listener_rule = var.test_listener_rule_arn
+        role_arn           = aws_iam_role.ecs_infrastructure[0].arn
+      }
    }
  }


advanced_configuration wired unconditionally for rolling deployments — not validated against real AWS API

The whole design bet is that AWS silently accepts advanced_configuration (alternate TG, production listener rule, infrastructure role) when deployment_configuration is absent (the rolling default). All tests run with command = plan against mocked providers, so the AWS API's acceptance of this combination has not been exercised. If AWS rejects advanced_configuration on a CreateService call that carries no deployment_configuration, every rolling service with a load balancer (the module default) would fail to provision. A real tofu apply against AWS — even in a scratch account — should be added to the test suite to confirm this assumption before merging.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/ecs_service.tf Line: 93-111 Comment: **`advanced_configuration` wired unconditionally for rolling deployments — not validated against real AWS API** The whole design bet is that AWS silently accepts `advanced_configuration` (alternate TG, production listener rule, infrastructure role) when `deployment_configuration` is absent (the rolling default). All tests run with `command = plan` against mocked providers, so the AWS API's acceptance of this combination has not been exercised. If AWS rejects `advanced_configuration` on a `CreateService` call that carries no `deployment_configuration`, every rolling service with a load balancer (the module default) would fail to provision. A real `tofu apply` against AWS — even in a scratch account — should be added to the test suite to confirm this assumption before merging. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-04T16:23:04Z

 output "target_group_arn" {
-  description = "The ARN of the target group (null if load balancer disabled or blue/green deployment)."
-  value       = local.enable_load_balancer && var.deployment_type == "rolling" ? aws_lb_target_group.this[0].arn : null
+  description = "The ARN of the production target group the service serves from (alias of production_target_group_arn; null if load balancer disabled)."
+  value       = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn : null
 }

 output "target_group_arn_suffix" {
-  description = "The ARN suffix of the target group for CloudWatch metrics."
-  value       = local.enable_load_balancer && var.deployment_type == "rolling" ? aws_lb_target_group.this[0].arn_suffix : null
+  description = "The ARN suffix of the production target group for CloudWatch metrics."
+  value       = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn_suffix : null
 }


target_group_name dropped without a backward-compatible alias

target_group_arn was deliberately kept as an alias for production_target_group_arn, but target_group_name — the natural companion — was removed entirely. Callers that reference module.xxx.target_group_name (e.g., in CloudWatch dashboards or external scripts) will get a Terraform error on upgrade. This is not listed as a named breaking change in the PR description. Adding output "target_group_name" { value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].name : null } would match the alias pattern used for the ARN.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/outputs.tf Line: 96-104 Comment: **`target_group_name` dropped without a backward-compatible alias** `target_group_arn` was deliberately kept as an alias for `production_target_group_arn`, but `target_group_name` — the natural companion — was removed entirely. Callers that reference `module.xxx.target_group_name` (e.g., in CloudWatch dashboards or external scripts) will get a Terraform error on upgrade. This is not listed as a named breaking change in the PR description. Adding `output "target_group_name" { value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].name : null }` would match the alias pattern used for the ARN. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

- Reject >1 ALB listener rule for native traffic-shift strategies: advanced_configuration only rewrites a single production listener rule, so extra rules would keep serving the old revision for the whole deployment. Documented the constraint on listener_rules since the strategy is a per-deployment decision the deploy manager can change without Terraform. - Restore the target_group_name output as an alias of production_target_group_name, matching the target_group_arn alias. - Make TestEcsServiceWithAlb assert the deployed rolling service carries advanced_configuration (alternate TG, production listener rule, infrastructure role), validating that the real AWS API accepts it on CreateService without a deployment_configuration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-10T14:20:53Z

Ravion Module Publish Plan

Dry run only. No Ravion API mutations were made.

Module	Current Version	New Version	Description
`rvn-ecs-cluster`	`0.1.3`	`0.2.0`	Automatically derive the default capacity provider so the cluster default strategy commits to a single provider family
`rvn-ecs-web`	`0.5.1`	`0.6.0`	Add native ECS blue/green, linear, and canary deployments with per-deploy strategy controls, manual approval gates, standby validation traffic, production/alternate target groups, ALB group stickiness guidance, ECS infrastructure role, and load-balancer advanced configuration outputs.

Diffs

rvn-ecs-cluster 0.1.3 -> 0.2.0

--- remote
+++ compiled
   - **Public and private Network Load Balancers** for TCP/UDP and static IP use cases
   - **CloudWatch Container Insights** dashboard metrics for production visibility
 
-  Terraform source: [flightcontrolhq/modules/compute/ecs_cluster](https://github.com/flightcontrolhq/modules/tree/rvn-ecs-cluster@0.1.3/compute/ecs_cluster)
+  Terraform source: [flightcontrolhq/modules/compute/ecs_cluster](https://github.com/flightcontrolhq/modules/tree/rvn-ecs-cluster@0.2.0/compute/ecs_cluster)
 
   ## Use cases
 
@@
         base_path: compute/ecs_cluster
         branch: main
         execution_environment_id: << module.input.execution_environment_id >>
-        ref: rvn-ecs-cluster@0.1.3
+        ref: rvn-ecs-cluster@0.2.0
         repo: https://github.com/flightcontrolhq/modules
         stack_id: <<stack.id>>
         terraform_variables:

rvn-ecs-web 0.5.1 -> 0.6.0

--- remote
+++ compiled
     queue_overflow: oldest
     queue_size: 1
   infrastructure:
+    ecs_alternate_target_group_arn: <<stack.output.alternate_target_group_arn>>
     ecs_cluster_arn: <<stack.output.service_cluster>>
+    ecs_infrastructure_role_arn: <<stack.output.ecs_infrastructure_role_arn>>
+    ecs_production_listener_rule_arn: <<stack.output.production_listener_rule_arn>>
     ecs_service_arns:
       - <<stack.output.service_arn>>
+    ecs_target_group_arn: <<stack.output.target_group_arn>>
+    ecs_test_listener_rule_arn: <<stack.output.test_listener_rule_arn>>
   inputs:
     - description: Pass the image tag or digest to deploy. For Nixpacks or Dockerfile builds, this is resolved in the Ravion-created ECR repository. For Prebuilt image from registry mode, this is resolved in the repository configured on the module. Do not pass a full image URI.
       id: image_ref
@@
       type: string
   post_deploy: '<< module.input.post_deploy_enabled && len(module.input.post_deploy_command) > 0 ? {"container_overrides": [{"name": stack.output.container_name, "command": module.input.post_deploy_command, "environment": module.input.post_deploy_environment_variables, "cpu": module.input.post_deploy_cpu || nil, "memory": module.input.post_deploy_memory || nil}], "cpu": string(module.input.post_deploy_cpu || (module.input.capacity_provider == "ec2" ? int(float(module.input.task_cpu) * 1024) : int(float(module.input.fargate_size.vcpu) * 1024))), "memory": string(module.input.post_deploy_memory || (module.input.capacity_provider == "ec2" ? int(float(module.input.task_memory) * 1024) : int(float(module.input.fargate_size.memory_gb) * 1024))), "ephemeral_storage": (module.input.post_deploy_ephemeral_storage_size_gib ? {size_in_gib: module.input.post_deploy_ephemeral_storage_size_gib} : nil), "task_role_arn": stack.output.task_role_arn, "execution_role_arn": stack.output.execution_role_arn, "capacity_provider_strategy": ((module.input.capacity_provider == "fargate" || module.input.additional_fargate_capacity_enabled ? [{capacity_provider: module.input.fargate_capacity_provider_name, weight: 1, base: 0}] : []) | concat(module.input.capacity_provider == "fargate_spot" || module.input.additional_fargate_spot_capacity_enabled ? [{capacity_provider: module.input.fargate_spot_capacity_provider_name, weight: 1, base: 0}] : []) | concat(module.input.capacity_provider == "ec2" || module.input.additional_ec2_capacity_enabled ? [{capacity_provider: module.input.ec2_capacity_provider_name, weight: 1, base: 0}] : [])), "network_configuration": {"awsvpc_configuration": {"subnets": (module.input.private_subnet_placement_enabled ? module.input.private_subnet_ids : module.input.public_subnet_ids), "security_groups": ([stack.output.security_group_id] | concat(module.input.security_group_ids != nil ? module.input.security_group_ids : [])), "assign_public_ip": (module.input.private_subnet_placement_enabled ? "DISABLED" : "ENABLED")}}, "enable_execute_command": module.input.execute_command_enabled, "timeout": module.input.post_deploy_timeout} : nil >>'
   pre_deploy: '<< module.input.pre_deploy_enabled && len(module.input.pre_deploy_command) > 0 ? {"container_overrides": [{"name": stack.output.container_name, "command": module.input.pre_deploy_command, "environment": module.input.pre_deploy_environment_variables, "cpu": module.input.pre_deploy_cpu || nil, "memory": module.input.pre_deploy_memory || nil}], "cpu": string(module.input.pre_deploy_cpu || (module.input.capacity_provider == "ec2" ? int(float(module.input.task_cpu) * 1024) : int(float(module.input.fargate_size.vcpu) * 1024))), "memory": string(module.input.pre_deploy_memory || (module.input.capacity_provider == "ec2" ? int(float(module.input.task_memory) * 1024) : int(float(module.input.fargate_size.memory_gb) * 1024))), "ephemeral_storage": (module.input.pre_deploy_ephemeral_storage_size_gib ? {size_in_gib: module.input.pre_deploy_ephemeral_storage_size_gib} : nil), "task_role_arn": stack.output.task_role_arn, "execution_role_arn": stack.output.execution_role_arn, "capacity_provider_strategy": ((module.input.capacity_provider == "fargate" || module.input.additional_fargate_capacity_enabled ? [{capacity_provider: module.input.fargate_capacity_provider_name, weight: 1, base: 0}] : []) | concat(module.input.capacity_provider == "fargate_spot" || module.input.additional_fargate_spot_capacity_enabled ? [{capacity_provider: module.input.fargate_spot_capacity_provider_name, weight: 1, base: 0}] : []) | concat(module.input.capacity_provider == "ec2" || module.input.additional_ec2_capacity_enabled ? [{capacity_provider: module.input.ec2_capacity_provider_name, weight: 1, base: 0}] : [])), "network_configuration": {"awsvpc_configuration": {"subnets": (module.input.private_subnet_placement_enabled ? module.input.private_subnet_ids : module.input.public_subnet_ids), "security_groups": ([stack.output.security_group_id] | concat(module.input.security_group_ids != nil ? module.input.security_group_ids : [])), "assign_public_ip": (module.input.private_subnet_placement_enabled ? "DISABLED" : "ENABLED")}}, "enable_execute_command": module.input.execute_command_enabled, "timeout": module.input.pre_deploy_timeout} : nil >>'
+  strategy: |
+    <<
+    module.input.deployment_strategy == "rolling" ? nil :
+    module.input.deployment_strategy == "blue_green" ? {
+      "type": "blue_green",
+      "bake_time_in_minutes": module.input.deployment_bake_time_in_minutes,
+      "pause_stages": module.input.deployment_pause_stages
+    } :
+    module.input.deployment_strategy == "linear" ? {
+      "type": "linear",
+      "bake_time_in_minutes": module.input.deployment_bake_time_in_minutes,
+      "pause_stages": module.input.deployment_pause_stages,
+      "linear": {
+        "step_percentage": module.input.linear_step_percentage,
+        "step_bake_time_in_minutes": module.input.linear_step_bake_time_in_minutes
+      }
+    } : {
+      "type": "canary",
+      "bake_time_in_minutes": module.input.deployment_bake_time_in_minutes,
+      "pause_stages": module.input.deployment_pause_stages,
+      "canary": {
+        "canary_percent": module.input.canary_percent,
+        "canary_bake_time_in_minutes": module.input.canary_bake_time_in_minutes
+      }
+    }
+    >>
   task_definition:
     container_definitions: "<< (module.input.firelens_enabled ? [{\"command\": (len(module.input.prebuilt_image_start_command) > 0 ? module.input.prebuilt_image_start_command : nil), \"cpu\": (module.input.capacity_provider == \"ec2\" ? int(float(module.input.task_cpu) * 1024) : int(float(module.input.fargate_size.vcpu) * 1024)), \"depends_on\": [{\"container_name\": \"log_router\", \"condition\": \"START\"}], \"environment\": ([{name: \"PORT\", value: string(module.input.container_port)}] | concat(module.input.environment_variables != nil ? module.input.environment_variables : [])), \"essential\": true, \"image\": (module.input.build_type == \"prebuilt_image\" ? (deploy.input.image_ref contains \"sha256:\" ? module.input.image_repository + \"@\" + deploy.input.image_ref : module.input.image_repository + \":\" + deploy.input.image_ref) : (deploy.input.image_ref contains \"sha256:\" ? stack.output.ecr_repository_url + \"@\" + deploy.input.image_ref : stack.output.ecr_repository_url + \":\" + deploy.input.image_ref)), \"linux_parameters\": {\"init_process_enabled\": true}, \"log_configuration\": {\"log_driver\": \"awsfirelens\"}, \"memory\": (module.input.capacity_provider == \"ec2\" ? int(float(module.input.task_memory) * 1024) : int(float(module.input.fargate_size.memory_gb) * 1024)), \"name\": (stack.output.container_name), \"port_mappings\": [{\"app_protocol\": \"http\", \"container_port\": (module.input.container_port), \"protocol\": \"tcp\"}], \"readonly_root_filesystem\": false, \"repository_credentials\": (module.input.image_registry_credentials_secret_arn ? {credentials_parameter: module.input.image_registry_credentials_secret_arn} : nil), \"secrets\": (module.input.secrets), \"stop_timeout\": 30}, {\"name\": \"log_router\", \"image\": module.input.firelens_image, \"cpu\": 0, \"memory_reservation\": 51, \"essential\": true, \"environment\": ([{name: \"FIRELENS_CONFIG_CONTENT\", value: \"[SERVICE]\\n    Flush 1\\n    Grace 30\\n\\n\" + (module.input.firelens_config ? module.input.firelens_config + \"\\n\" : \"\") + (module.input.firelens_cloudwatch_output_enabled ? \"\\n[OUTPUT]\\n    Name cloudwatch\\n    Match *\\n    region \" + stack.output.region + \"\\n    log_group_name \" + stack.output.log_group_name + \"\\n    auto_create_group true\\n    log_stream_prefix \" + stack.output.log_stream_prefix + \"/\\n    retry_limit 2\\n    log_key log\\n    log_format json/emf\\n\" : \"\")}] | concat(module.input.firelens_environment_variables != nil ? module.input.firelens_environment_variables : [])), \"secrets\": module.input.firelens_secrets != nil ? module.input.firelens_secrets : [], \"command\": [\"/bin/sh\", \"-c\", \"printf '%s' \\\"$FIRELENS_CONFIG_CONTENT\\\" > /flightcontrol-firelens.conf && exec /entrypoint.sh\"], \"user\": \"0\", \"log_configuration\": {\"log_driver\": \"awslogs\", \"options\": {\"awslogs-group\": stack.output.log_group_name, \"awslogs-region\": stack.output.region, \"awslogs-stream-prefix\": stack.output.log_stream_prefix + \"/firelens\"}}, \"firelens_configuration\": {\"type\": \"fluentbit\", \"options\": {\"config-file-type\": \"file\", \"config-file-value\": \"/flightcontrol-firelens.conf\", \"enable-ecs-log-metadata\": string(module.input.firelens_ecs_metadata_enabled)}}}] : [{\"command\": (len(module.input.prebuilt_image_start_command) > 0 ? module.input.prebuilt_image_start_command : nil), \"cpu\": (module.input.capacity_provider == \"ec2\" ? int(float(module.input.task_cpu) * 1024) : int(float(module.input.fargate_size.vcpu) * 1024)), \"environment\": ([{name: \"PORT\", value: string(module.input.container_port)}] | concat(module.input.environment_variables != nil ? module.input.environment_variables : [])), \"essential\": true, \"image\": (module.input.build_type == \"prebuilt_image\" ? (deploy.input.image_ref contains \"sha256:\" ? module.input.image_repository + \"@\" + deploy.input.image_ref : module.input.image_repository + \":\" + deploy.input.image_ref) : (deploy.input.image_ref contains \"sha256:\" ? stack.output.ecr_repository_url + \"@\" + deploy.input.image_ref : stack.output.ecr_repository_url + \":\" + deploy.input.image_ref)), \"linux_parameters\": {\"init_process_enabled\": true}, \"log_configuration\": {\"log_driver\": \"awslogs\", \"options\": {\"awslogs-group\": stack.output.log_group_name, \"awslogs-region\": stack.output.region, \"awslogs-stream-prefix\": stack.output.log_stream_prefix}}, \"memory\": (module.input.capacity_provider == \"ec2\" ? int(float(module.input.task_memory) * 1024) : int(float(module.input.fargate_size.memory_gb) * 1024)), \"name\": (stack.output.container_name), \"port_mappings\": [{\"app_protocol\": \"http\", \"container_port\": (module.input.container_port), \"protocol\": \"tcp\"}], \"readonly_root_filesystem\": false, \"repository_credentials\": (module.input.image_registry_credentials_secret_arn ? {credentials_parameter: module.input.image_registry_credentials_secret_arn} : nil), \"secrets\": (module.input.secrets), \"stop_timeout\": 30}]) | concat(module.input.sidecars ? module.input.sidecars : []) >>"
     cpu: '<< string(module.input.capacity_provider == "ec2" ? int(float(module.input.task_cpu) * 1024) : int(float(module.input.fargate_size.vcpu) * 1024)) >>'
@@
     show_when:
       build_type: nixpacks
     type: string_array
+  - id: section_deployment
+    label: Deployment
+    type: section
+  - default: rolling
+    description: Choose how traffic moves from the current deployment to the new deployment.
+    id: deployment_strategy
+    label: Deployment strategy
+    required: true
+    type: string
+    values:
+      - description: Replace tasks in place while the service 
... diff truncated ...

AWS rejects explicit from_port/to_port when ip_protocol is "-1", which broke updates to existing rules (the sg-module refactor changed the ECS instance ingress rule from -1/-1 to 0/0). Null the ports in the shared security-groups module whenever the protocol is "-1" or "all", and use -1 in the ecs_cluster caller for clarity. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

# Conflicts: # compute/ecs_cluster/README.md # compute/ecs_cluster/locals.tf # compute/ecs_cluster/rvn-ecs-cluster-definition.yml # compute/ecs_service/README.md # compute/ecs_service/rvn-ecs-web-definition.yml

…ner rule Make the green (test) ALB listener rule's distinguishing condition configurable between an HTTP header and a query string via test_traffic_condition_type. Default to the query string __x-rvn-test__=1 so test traffic reaches the green target group without a custom header. ALB AND-combines conditions and ECS native blue/green wires exactly one test rule, so the selector is one type per service, not simultaneous header-OR-query matching.

mabadir · 2026-06-16T19:22:09Z

@greptile

greptile-apps · 2026-06-16T19:26:21Z

+    var.load_balancer_attachment.listener_rules[0].priority,
+    local.green_default_production_priority,
+  ) : null
+  green_test_priority = local.green_alb_listener_rule_enabled ? local.green_production_priority - 1 : null


Test-rule priority becomes 0 when production priority is 1

green_test_priority = local.green_production_priority - 1 produces 0 when a caller explicitly sets priority = 1. AWS ALB rejects listener rule priorities outside the range 1–50000, so the apply would fail with an API error (ValidationError: The number must be between 1 and 50000). There is no variable validation that prevents listener_rules[0].priority = 1, making this a real failure path for services that want the highest-priority rule. A guard is needed — either validate that listener_rules[0].priority >= 2 when green_alb_listener_rule_enabled is true, or clamp green_test_priority to a minimum of 1.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/locals.tf Line: 72 Comment: **Test-rule priority becomes 0 when production priority is 1** `green_test_priority = local.green_production_priority - 1` produces `0` when a caller explicitly sets `priority = 1`. AWS ALB rejects listener rule priorities outside the range 1–50000, so the apply would fail with an API error (`ValidationError: The number must be between 1 and 50000`). There is no variable validation that prevents `listener_rules[0].priority = 1`, making this a real failure path for services that want the highest-priority rule. A guard is needed — either validate that `listener_rules[0].priority >= 2` when `green_alb_listener_rule_enabled` is true, or clamp `green_test_priority` to a minimum of 1. How can I resolve this? If you propose a fix, please make it concise.

# Conflicts: # compute/ecs_service/rvn-ecs-web-definition.yml

mabadir · 2026-06-17T23:03:52Z

@greptile

greptile-apps · 2026-06-17T23:16:50Z

+  from = aws_lb_target_group.this[0]
+  to   = aws_lb_target_group.tg_1[0]
+}


Rolling-service upgrade will fail: moved + name change + ignore_changes = [action] deadlock

For any existing rolling service with a load balancer, upgrading to this module version triggers a tofu apply failure:

The moved block transfers the state address of aws_lb_target_group.this[0] (name suffix -tg, max-prefix 28 chars) to aws_lb_target_group.tg_1[0] (suffix -tg-1, max-prefix 24 chars). Every service gets a different name, so Terraform plans a replacement.

With create_before_destroy = true, the new TG is created first (new ARN). However, ignore_changes = [action] on aws_lb_listener_rule.alb suppresses the ARN reference update — the listener rule keeps pointing to the old TG ARN.

Terraform then attempts to destroy the old TG. AWS rejects this: "Target group is currently in use by a listener rule", and the apply fails.

The apply exits mid-way: the new TG exists but is unreachable, and the old TG is stuck — referenced by the listener rule but orphaned from Terraform state. Manual cleanup is required. This affects every load-balanced rolling service; the PR description mentions "target-group recreation on upgrade" but not that the apply itself fails.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/moved.tf Line: 6-8 Comment: **Rolling-service upgrade will fail: `moved` + name change + `ignore_changes = [action]` deadlock** For any existing rolling service with a load balancer, upgrading to this module version triggers a `tofu apply` failure: 1. The `moved` block transfers the state address of `aws_lb_target_group.this[0]` (name suffix `-tg`, max-prefix 28 chars) to `aws_lb_target_group.tg_1[0]` (suffix `-tg-1`, max-prefix 24 chars). Every service gets a different `name`, so Terraform plans a replacement. 2. With `create_before_destroy = true`, the new TG is created first (new ARN). However, `ignore_changes = [action]` on `aws_lb_listener_rule.alb` suppresses the ARN reference update — the listener rule keeps pointing to the old TG ARN. 3. Terraform then attempts to destroy the old TG. AWS rejects this: "Target group is currently in use by a listener rule", and the `apply` fails. The apply exits mid-way: the new TG exists but is unreachable, and the old TG is stuck — referenced by the listener rule but orphaned from Terraform state. Manual cleanup is required. This affects every load-balanced rolling service; the PR description mentions "target-group recreation on upgrade" but not that the apply itself fails. How can I resolve this? If you propose a fix, please make it concise.

Re-adopting a pre-existing target group via the moved block changed the name from -tg to -tg-1, forcing replacement. With ignore_changes=[action] on the listener rule, the rule never repointed to the new TG and the old TG's destroy failed (in use by listener rule), breaking apply mid-way. Ignore changes to name so migrated TGs keep their existing name/ARN in place; fresh services still get the -tg-1 name.

# Conflicts: # compute/ecs_cluster/ec2.tf # compute/ecs_cluster/rvn-ecs-cluster-definition.yml

mabadir · 2026-06-18T20:07:17Z

@greptile

flybayer · 2026-06-18T20:42:53Z

      base_path: compute/ecs_cluster
      terraform_variables:
        ...overrides: << module.input.advanced_terraform_variables >>
+        capacity_provider_default: << module.input.capacity_provider_default >>


i don't see input added for this, but instead of that just derive from the enabled provider flags? defaulting to something automatically?

flybayer · 2026-06-18T20:42:54Z

+      values:
+        - label: Rolling
+          value: rolling
+        - label: Blue/green
+          value: blue_green
+        - label: Linear
+          value: linear
+        - label: Canary
+          value: canary
+    - id: deployment_bake_time_in_minutes


let's add descriptions

flybayer · 2026-06-18T20:42:54Z

+    - id: test_header_name
+      label: Test traffic header name
+      type: string
+      description: HTTP header that routes a request to the green (test) target group during a deployment. Send this header with the value below to reach the new revision before production traffic shifts. Requests without it keep hitting production.
+      collapsible: true
+      required: true
+      default: X-Ravion-Test
+    - id: test_header_value
+      label: Test traffic header value
+      type: string
+      description: Value the test traffic header must carry to be routed to the green (test) target group.
+      collapsible: true
+      required: true
+      default: "1"


and can we add query string also?

actually query string is added as default, I am going to remove this input altogether.

flybayer · 2026-06-19T13:36:42Z

update readme with instructions on how to access the standby environment

this should be in the web def

yeah I got this

flybayer · 2026-06-19T15:03:19Z

+    ## Standby validation traffic
+
+    For Blue/green, Linear, and Canary deployments, Ravion creates a test listener rule on the same Application Load Balancer listener as the production rule. During the test traffic stage, requests that match the service's normal Domain host rules and Path rules and include the standby selector route to the standby, or green, task set on the alternate target group.
+
+    By default, the standby selector is the query parameter `__x-rvn-test__=1`. For example, if production traffic uses `https://app.example.com/health`, validate the standby service with:
+
+    ```text
+    https://app.example.com/health?__x-rvn-test__=1
+    ```
+
+    The alternate target group only has registered targets while ECS is running a traffic-shift deployment. Outside that window, the standby route may have no healthy targets.
+
+    Use Advanced Terraform variables to override the standby selector. Values in Advanced Terraform variables override the generated Terraform variables for the service.


need to add that it is sticky with cookie, so once first request goes to standy you are stuck on that uless you clear a cookie

lets document the cookie

and how can you tell if a response is from prod or standby?

mabadir and others added 2 commits June 4, 2026 00:25

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

mabadir added the run-terratest Run Terratest integration tests (real AWS apply) on this PR label Jun 5, 2026

mabadir added 3 commits June 8, 2026 23:22

Fix ECS infrastructure policy ARN

56b6f4a

Ignore target group name changes for tg_1

a849f7f

Merge branch 'main' into native-ecs-deployment-strategies

73e7e4a

mabadir and others added 8 commits June 10, 2026 10:39

default capacity provider

61909c3

update template

250bf02

update placeholder

4f8fe87

remove the default provider

69b2f01

fixes

a733398

test alb listener rule

02541ed

Merge branch 'main' into native-ecs-deployment-strategies

bd9ac38

# Conflicts: # compute/ecs_cluster/README.md # compute/ecs_cluster/locals.tf # compute/ecs_cluster/rvn-ecs-cluster-definition.yml # compute/ecs_service/README.md # compute/ecs_service/rvn-ecs-web-definition.yml

mabadir force-pushed the native-ecs-deployment-strategies branch from 777e8bd to bd9ac38 Compare June 11, 2026 20:29

mabadir added 7 commits June 11, 2026 19:37

add green_alb_listener_rule_enabled as tf var

8791e90

add green_alb_listener rule

4a8fd0e

enable group stickiness

703a380

test rule by default

7a6eff5

Merge branch 'main' into native-ecs-deployment-strategies

b8cb146

update module

4e2fc4a

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

mabadir added 2 commits June 17, 2026 10:38

add deployment options

7178ade

Merge branch 'main' into native-ecs-deployment-strategies

ecc72ee

# Conflicts: # compute/ecs_service/rvn-ecs-web-definition.yml

mabadir requested a review from flybayer June 17, 2026 23:03

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

mabadir removed the run-terratest Run Terratest integration tests (real AWS apply) on this PR label Jun 18, 2026

Merge branch 'main' into native-ecs-deployment-strategies

3bcf91b

# Conflicts: # compute/ecs_cluster/ec2.tf # compute/ecs_cluster/rvn-ecs-cluster-definition.yml

flybayer reviewed Jun 18, 2026

View reviewed changes

mabadir added 3 commits June 18, 2026 19:46

Fix ECS cluster capacity provider definition

a649bf1

Default ECS web test traffic to query string

e8965fa

Add ECS deployment input descriptions

9934440

mabadir requested a review from flybayer June 18, 2026 23:58

flybayer reviewed Jun 19, 2026

View reviewed changes

Clarify ECS traffic-shift deployment docs

86eb530

flybayer reviewed Jun 19, 2026

View reviewed changes

mabadir added 2 commits June 19, 2026 11:43

Document ALB group stickiness for ECS web service

24a5e1d

update to use item_description

3b7f51d

mabadir requested a review from flybayer June 19, 2026 15:56

flybayer approved these changes Jun 19, 2026

View reviewed changes

mabadir merged commit 90b7da8 into main Jun 19, 2026
6 checks passed

mabadir deleted the native-ecs-deployment-strategies branch June 20, 2026 14:36

Uh oh!

Conversation

mabadir commented Jun 4, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Native strategies (b4ee71a)

Strategy is per-deployment (3c72a84)

Outputs

Tests

Breaking changes

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ravion Module Publish Plan

Diffs

rvn-ecs-cluster 0.1.3 -> 0.2.0

rvn-ecs-web 0.5.1 -> 0.6.0

Uh oh!

mabadir commented Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mabadir commented Jun 17, 2026

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

mabadir commented Jun 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mabadir commented Jun 4, 2026 •

edited by greptile-apps Bot

Loading

Native strategies (`b4ee71a`)

Strategy is per-deployment (`3c72a84`)

github-actions Bot commented Jun 10, 2026 •

edited

Loading