feat(observability): solution attribution via native AWS_SDK_UA_APP_ID (#319, alt to #338)#345
feat(observability): solution attribution via native AWS_SDK_UA_APP_ID (#319, alt to #338)#345scottschreckengaust wants to merge 8 commits into
Conversation
…319) Alternative to PR #338: emit the app/ segment via the SDK-native AWS_SDK_UA_APP_ID env var (botocore + JS v3 read it automatically) using '#' instead of '/' as the separator, so no raw-user-agent-path machinery is needed. Each ua module owns only the static md/#{component} segment; there is no per-request {TRACE} handle, no before-send/middleware, and no module trace state — request correlation stays with X-Ray / structured logs (#245), and connection pools are never re-pinned. CloudFormation stack names are [A-Za-z0-9-], a subset of both the UA-token and app-id charsets, so {STACKNAME} needs no sanitization; the only sanitize() left is defensive cover on the component label. This commit adds the three mirrored helpers + tests (agent/src/ua.py, cdk/src/handlers/shared/ua.ts, cli/src/ua.ts). Each has a wire-capture test asserting both app/ and md/ segments land on a real outbound User-Agent header (and that app/ is omitted when AWS_SDK_UA_APP_ID is unset/empty — the customer opt-out). 12 agent + 12 cdk + 10 cli tests, 100% coverage on the new modules. Wiring the client sites + CDK env threading follow in subsequent commits. Part of #319 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sites (#319) Session-level user_agent_extra on both the scoped (refreshable) and the plain ambient session covers every tenant_client/tenant_resource caller. New platform_client() carries the static md/ segment (merged into any caller Config) for the 8 direct boto3.client sites that bypass the session by design — logs x5 (shell, server x2, telemetry x2), secrets manager x2 (config), bedrock-agentcore x1 (memory) — plus the ambient STS client used for role chaining. No per-request trace handle and no before-send appender: the md/ segment is fully static, so cached clients and the singleton session pool are never re-pinned. The app/ segment is contributed separately by the SDK from AWS_SDK_UA_APP_ID (threaded by CDK, next commit). 4 new aws_session tests assert the md/ segment rides platform_client, the unscoped tenant_client, a merged caller Config, and the scoped session. Full agent suite green (1070 tests). Part of #319 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Spread ...abcaUserAgent() into all 60 SDK v3 client constructors across 43 handler files (DynamoDB/Secrets Manager/Lambda/Bedrock/ECS/ BedrockAgentCore). DocumentClient sites instrument the inner DynamoDBClient (shared middleware stack). No withAbcaTrace/middleware — the md/ segment is fully static, so module-level cached clients keep their connection pools; the app/ segment rides native AWS_SDK_UA_APP_ID (threaded next commit). No behavior change beyond the UA header: all 2051 existing CDK tests pass unmodified (the new spread arg merges into the constructor config the tests already assert on / mock). Part of #319 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
) applyDefaultAppId() at startup defaults AWS_SDK_UA_APP_ID to the solution id (only when unset — an explicit '' opts out) so the CLI's own SDK calls carry the app/ segment with no per-site code. Spread ...abcaUserAgent() into all 18 AWS SDK v3 client sites (Cognito x3, Secrets Manager, CloudFormation, DynamoDB) across auth/admin/github/ slack/linear; the bgagent REST ApiClient is not an AWS SDK client and is untouched. auth.test.ts asserts the Cognito client constructor receives the md/ customUserAgent pair. Full CLI suite green (365 + new tests). Part of #319 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…319) The `app/` segment is now SDK-native: a stack-level SolutionUaAspect sets AWS_SDK_UA_APP_ID=uksb-wt64nei4u6#{stackName} on every Lambda (current and future, structurally), and the AgentCore runtime + ECS container set the same value explicitly (the Lambda-only aspect can't reach them). botocore and JS v3 both read AWS_SDK_UA_APP_ID natively, so no client code builds the app/ segment. `-c sdkUaAppId=''` opts the whole stack out; any other `-c sdkUaAppId=` value overrides. The `md/#{component}` label is per-surface ABCA_COMPONENT: 'api' (task-api commonEnv), 'orchestr' (orchestrator/reconcilers/cleanup/fanout), 'webhook' (slack/linear/github-screenshot integrations, via a per-construct ComponentUaAspect so every function in the integration — including future ones — is labeled without editing each env block). buildAppId() centralizes the value: defaults to uksb-wt64nei4u6#{stack}, sanitizes a non-CFN override, clips to the documented 50-char cap, and returns undefined for the empty-string opt-out. CloudFormation stack names are [A-Za-z0-9-] (already app-id-safe), so no stack-name sanitization is needed in the default path. New tests: solution-ua-aspect.test.ts (buildAppId vectors + both aspects); task-api/orchestrator template assertions for the component label. Full CDK suite green (2061 tests). Local synth fails only on the pre-existing ec2:DescribeAvailabilityZones cred gap (CI runs the real synth). Part of #319 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New "Common mistakes" bullet directing agent/handler/CLI code to the per-surface ua helpers and explaining the app/ (SDK-native via AWS_SDK_UA_APP_ID) vs md/ (explicit per-surface label) split, plus the customer opt-out. Root-level file — no Starlight sync needed. Part of #319 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
✅ Tested & verified — outbound User-Agent on the wire (both runtime tiers)Validated that this PR's solution-attribution segments land on the actual outbound What was checkedTwo standalone wire-capture scripts import the real helpers and print the assembled
Result — every call, both tiers (
|
❓ Follow-up: is there an intentional asymmetry in the opt-out? (
|
💡 Follow-up (design): make
|
| Surface | Where the id comes from | Honors an override? |
|---|---|---|
app/{id}#{stackName} (wire) |
buildAppId() → AWS_SDK_UA_APP_ID env set by SolutionUaAspect |
✅ -c sdkUaAppId=... (and '' opts out) |
md/{id}#{component} (wire) |
hardcoded SOLUTION_ID in each helper |
❌ literal |
({id}) (stack description) |
hardcoded literal in main.ts |
❌ literal |
Proposed ideal behavior
A customer provides the solution id once at synth time and that value is used consistently across all three surfaces. Either input path is acceptable and they can share one resolver — the synth-time AWS_SDK_UA_APP_ID env var or the existing -c sdkUaAppId=... context flag (with the same precedence buildAppId already uses):
- unset → default
uksb-wt64nei4u6everywhere (today's behavior, unchanged); - set to
myco-xyz→app/myco-xyz#{stackName},md/myco-xyz#{component}, and description(myco-xyz); -c sdkUaAppId=''(orAWS_SDK_UA_APP_ID='') → noapp/, nomd/, and a plain description (full opt-out).
This collapses four hardcoded literals into one synth-time input and makes the opt-out coherent (one switch governs all three, instead of -c sdkUaAppId='' silencing only app/ while md/ and the description still carry uksb-wt64nei4u6).
Feasibility / notes (no change requested — just scoping the conversation)
- Synth-time only.
main.tsis plain Node, soprocess.env.AWS_SDK_UA_APP_IDandtryGetContextare available there; the description is assembled before the template is written. CloudFormation'sDescriptionfield itself is a static string (noRef/Fn::Sub), so this must be resolved at synth — which is exactly where the value already lives. - Today
AWS_SDK_UA_APP_IDcarries the full app-id (uksb#stack), not just the solution-id prefix. This proposal would treat the synth-time env/context value as the solution id and letbuildAppId()continue composing#{stackName}. Worth deciding explicitly: is the customer input the solution id (prefix) or the entire app-id? They differ for themd/and description surfaces. - Ordering: the description is built on
main.ts:42, but the override is read on:50. Gating the description means hoisting that read above stack construction — small, but a real reorder. - Parity: the three
md/helpers must consume the override identically (the cross-surface parity the docstrings already require), and the wire-capture tests would gain an override + opt-out assertion.
Does treating a single synth-time AWS_SDK_UA_APP_ID (solution-id) as the source of truth for app/ + md/ + description fit the intended design for #319/#292?
Closes #319 (one of two candidate implementations — see #338 for the other)
The two options, side by side
app/uksb-wt64nei4u6#{stack}customUserAgent/user_agent_extrabecause/isn't a legal app-id charAWS_SDK_UA_APP_IDenv —#separator is legal, so zero client codemd/uksb-wt64nei4u6#{component}#{TRACE}per-requestwithAbcaTrace()wraps{STACKNAME}sanitize-then-clip-34[A-Za-z0-9-], already a subset of the app-id charset-c sdkUaAppId=''(deploy) orAWS_SDK_UA_APP_ID=''(CLI) — built-inThe key unlock (confirmed against installed botocore 1.43.9 + JS v3): using
#instead of/as theapp/separator keeps the segment on the SDK's native app-id field, which both SDKs read fromAWS_SDK_UA_APP_IDautomatically. That removes the only reason #338 needed the raw user-agent path and the entire trace plane.What's implemented (all complete)
agent/src/ua.py,cdk/src/handlers/shared/ua.ts,cli/src/ua.ts: each owns only the staticmd/#{component}segment + a defensivesanitizeUaValue(). Wire-capture tests on all three assert both segments land on a real outboundUser-Agentheader (andapp/is omitted whenAWS_SDK_UA_APP_IDis unset/empty).user_agent_extra(scoped + plain sessions) covers everytenant_client/tenant_resource; newplatform_client()carries themd/segment for the 8 directboto3.clientsites. No before-send appender....abcaUserAgent()spread into all 60 SDK client sites across 43 files. No middleware.applyDefaultAppId()at startup +...abcaUserAgent()on all 18 AWS SDK client sites;auth.test.tsasserts the Cognito client receives themd/pair.SolutionUaAspectsetsAWS_SDK_UA_APP_IDon every Lambda (current + future, structurally); AgentCore runtime + ECS container set it explicitly; per-surfaceABCA_COMPONENT(api/orchestr/webhook, integrations via a per-constructComponentUaAspect). Template assertions added.AGENTS.md"Common mistakes" bullet for future SDK clients.Verification
//agent:qualitygreen: 1074 tests, 78.98% coverage, ruff + ty clean.//cdk:testgreen: 2061 tests (114 suites), compile clean, eslint 0 errors.//cli:buildgreen: 365+ tests, eslint clean.//docs:buildgreen; no Starlight mutation.cdk synthfails only on the pre-existingec2:DescribeAvailabilityZonescred gap on this workstation (same asmainand feat(observability): inject solution into outbound AWS SDK User-Agent (#319) #338) — CI's build job runs the real synth.🤖 Generated with Claude Code