Skip to content

fix: align handlers with MAP 2.0 eligible services list#5

Open
hyunsies wants to merge 12 commits intomainfrom
fix/map-eligible-services-cleanup
Open

fix: align handlers with MAP 2.0 eligible services list#5
hyunsies wants to merge 12 commits intomainfrom
fix/map-eligible-services-cleanup

Conversation

@hyunsies
Copy link
Copy Markdown
Contributor

Summary

Cross-referenced all service handlers against the official MAP 2.0 Included Services List (6 April 2026).

Removals — not MAP-eligible:

  • AppConfig (appconfig.amazonaws.com) — not in MAP 2.0 list; removed 3 handlers (CreateApplication, CreateEnvironment, CreateConfigurationProfile) + IAM permission
  • OpenSearch Serverless (aoss.amazonaws.com) — MAP list explicitly states "Excludes OpenSearch Serverless and OpenSearch Ingestion"; removed CreateCollection handler + IAM permission

Additions — MAP-eligible, missing handlers:

  • EKS (eks.amazonaws.com) — CreateClustercluster.arn
  • OpenSearch managed (es.amazonaws.com) — CreateDomaindomainStatus.ARN (note: Serverless uses aoss.amazonaws.com, this only matches the managed service)

Bug fixes — event_source guards:

  • SQS CreateQueue: Deadline Cloud also fires CreateQueue via deadline.amazonaws.com; added event_source == 'sqs.amazonaws.com' guard to prevent wrong ARN extraction
  • AppSync CreateGraphqlApi: added event_source guard
  • Cognito CreateUserPool: added event_source guard

Services confirmed as MAP-eligible (kept)

DataBrew (under Glue), DAX (standalone since Mar 2026), Deadline Cloud, EMR Serverless, IoT Core, IoT SiteWise, Kinesis Video Streams, Bedrock — all verified against MAP 2.0 wiki.

Test plan

  • CI Layer 1 (cfn-lint, Python syntax, handler regression) passes
  • Layer 2 E2E: no regressions on existing services
  • Verify EKS CreateCluster tags correctly in E2E

🤖 Generated with Claude Code

hyunsies and others added 12 commits April 16, 2026 23:29
Removals (not MAP-eligible):
- AppConfig (appconfig.amazonaws.com) — 3 handlers + IAM permission removed
- OpenSearch Serverless (aoss.amazonaws.com) — MAP list explicitly excludes Serverless

Additions (MAP-eligible, missing handlers):
- EKS (eks.amazonaws.com) — CreateCluster → cluster.arn
- OpenSearch managed (es.amazonaws.com) — CreateDomain → domainStatus.ARN

Bug fixes (event_source guards):
- SQS CreateQueue: added event_source guard (Deadline Cloud also fires CreateQueue)
- AppSync CreateGraphqlApi: added event_source guard
- Cognito CreateUserPool: added event_source guard

All other services confirmed MAP-eligible via internal MAP 2.0 wiki.
Cross-referenced against official AWS Migrations Included Services List (6 April 2026).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Keeps configurator.html in sync with map2-auto-tagger-optimized.yaml
after removing AppConfig and OpenSearch Serverless IAM permissions.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Adds minimal EKS cluster creation to core.py to cover the new
eks.amazonaws.com CreateCluster handler. No node groups — cluster
creation alone fires the CloudTrail event needed for tagging verification.

Also adds EKS teardown handler in teardown.py.

Note: EKS cluster creation takes ~12 min. E2E does not wait for ACTIVE
state — tagging is verified after the cluster is ACTIVE at verify time.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Root cause: CloudFormation creates IAM roles named map-auto-tagger-role-<mpe>-<region>
inside StackSet stack instances. If stack deletion fails or times out, the role is
left orphaned. These roles are not tagged, so teardown.py's tag-based sweep misses them.
The orphaned role then blocks subsequent E2E runs that try to deploy a new CFN stack
with the same role name (IAM role names must be unique per account).

Fix: adds sweep_iam_roles.py which deletes any role matching the map-auto-tagger-role-*
prefix in the current account. Called in cleanup.yml after teardown for each linked
account in the nightly cleanup run.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The pre-flight check used --no-include-shadow-trails which caused false
failures for accounts covered by an AWS Organizations trail or a
multi-region trail created in another region. Shadow trails are how
these valid setups appear in member/regional accounts.

Removing the flag makes describe-trails return all trails visible in the
region, so org-level and multi-region trail setups pass correctly.

Reported by SmileShark partner testing multi-region deployments.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Prior PR runs can leave orphaned StackSet instances in linked accounts.
These cause CAPABILITY_NAMED_IAM failures when a new PR tries to deploy
because the IAM role and Lambda log group names are account-scoped and
collide with the existing stale stacks.

Add a "Clean up stale StackSets from prior PR runs" step to the
deploy-stackset job that iterates over all active map-auto-tagger-e2e-pr*
StackSets (excluding the current PR's) and deletes them before deploying.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
SERVICE_MANAGED StackSets require OrganizationalUnitIds (not Accounts)
in DeploymentTargets when deleting instances. The prior code passed
Accounts directly, causing a ValidationError and leaving orphaned stacks.

Changes:
- delete_stackset.py: detect permission model; use OrganizationalUnitIds
  (with INTERSECTION filter when OU ID provided, else empty list) for
  SERVICE_MANAGED StackSets; add --org-unit-ids argument
- e2e.yml: pass --org-unit-ids to both the pre-deploy cleanup step and
  the teardown Delete multi-account StackSet step

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Problem: EventBridge → Lambda direct invocation has a 24h retry limit.
If Lambda is throttled (account near concurrency quota), events older
than 24h are silently dropped. ReservedConcurrentExecutions: 10 also
caused deployment failures on accounts near their Lambda quota.

Solution:
- EventBridge now routes to SQS queue (14-day retention) instead of
  Lambda directly. SQS polls Lambda, retrying failed messages up to 3x
  before sending to DLQ (also 14-day retention).
- Removed ReservedConcurrentExecutions — SQS buffering handles burst
  naturally without needing reserved slots.
- Added DLQ CloudWatch alarm → SNS email when events can't be processed.
- Lambda now publishes specific failed resource ARN to SNS on tagging
  failure, so the alert email identifies exactly which resource failed.
- Lambda handler unwraps SQS envelope transparently.

Reported by SmileShark partner: customers near Lambda concurrency quota
were seeing deployment failures and missed tagging events.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…g/ServiceSpecificTagging

New SqsEventSource and AlertPublish Sids added for SQS polling and SNS
alerting were not being picked up by the regex. Broadened to match all Sids.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Lambda's DeadLetterConfig requires sqs:SendMessage to forward failed
invocations to the DLQ. Missing permission caused CFN deploy failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant