Skip to content

Mariano/mcp infrastructure#19

Open
mariano-macri wants to merge 21 commits into
mainfrom
mariano/mcp_infrastructure
Open

Mariano/mcp infrastructure#19
mariano-macri wants to merge 21 commits into
mainfrom
mariano/mcp_infrastructure

Conversation

@mariano-macri
Copy link
Copy Markdown

No description provided.

…reds

Pre-migration prep that landed during the AgentCore deployment phase:

- Mount MCP HTTP routes on /mcp instead of / (matches the path
  AgentCore Runtime, the AWS Lambda Function URL gateway, and the
  Stainless-hosted gateway all expect).
- Allow launchStreamableHTTPServer / streamableHTTPApp to accept a
  clientOptions object so the Grid client can be configured with
  credentials sourced from environment variables in HTTP transport
  mode (mirrors the stdio mode behavior). Per-request x-grid-*
  headers still take precedence.
Switches container to run on AWS Lambda via LWA extension. Express
server is unmodified; LWA bridges Lambda invocations to localhost:8000.
Keeps Stainless-hosted code/docs modes (local Deno would expose a
public unauthenticated code-exec path on a no-auth Function URL).
AgentCore stripped non-allowlisted headers at the edge so this gap was
latent. Lambda/LWA forwards all headers to the container, so customer
Grid creds in x-grid-client-id, x-grid-client-secret, x-grid-signature
would otherwise land in CloudWatch in plaintext.
Prep for removing Cognito M2M auth (Lambda Function URL replaces it).
DeleteUserPool requires DeletionProtection=INACTIVE.
Captures main.tf, agentcore.tf, agentcore_iam.tf, and the provider lock
file as the baseline state before the AgentCore -> Lambda migration.
These were previously untracked in the working tree; making them
tracked so subsequent migration commits (delete agentcore + cognito,
add lambda) appear as proper diffs and so the rollback path can
'git revert' through them cleanly.
Adds aws_lambda_function (container, arm64) + Function URL with
response streaming, KMS-encrypted env, reserved concurrency cap=5,
explicit ECR repo pull policy, and CloudWatch Errors/Throttles alarms.
CORS is intentionally NOT configured (browsers out of scope for v1).
AgentCore + Cognito remain in place until the new Lambda is
smoke-tested (destroyed in Task 10). lambda_image_tag and
stainless_api_key are persisted in gitignored terraform.auto.tfvars.
Lambda Function URL has been smoke-tested end-to-end (all 6 Task 7
checks green including execute via Stainless sandbox + real Grid API).
AgentCore + Cognito M2M are no longer needed; Task 10 destroys the
underlying AWS resources.
CloudFront injects X-Origin-Secret on every origin request; Express
middleware rejects /mcp requests without a matching value. Health
endpoint stays open for LWA readiness checks. Timing-safe compare.
Throw at boot from launchStreamableHTTPServer when AWS_LAMBDA_FUNCTION_NAME
is set but ORIGIN_SECRET is unset. Without this guard, a Lambda env var
deploy bug would leave the origin-secret middleware no-opping and /mcp
wide open to direct Function URL hits, bypassing CloudFront's WAF and
origin gate. Surfacing the misconfig as an InitError is strictly safer
than silent exposure.
CloudFront OAC requires clients to compute and send x-amz-content-sha256
for POST bodies (per AWS docs); standard MCP clients don't. Replace with
CloudFront-injected X-Origin-Secret header validated server-side. Drops
the OAC, the custom origin request policy, and aws_lambda_permission;
flips Function URL to auth_type=NONE; adds ORIGIN_SECRET env var.
tsc with noPropertyAccessFromIndexSignature requires bracket access
on dynamic env-var lookups. ts-jest is more lenient so tests passed,
but the production tsc invocation in 'pnpm build' (inside Docker)
fails on the dotted-access form.
…th_type=NONE

AWS requires an explicit resource-based policy granting principal='*'
even when authorization_type=NONE on the Function URL. Without this,
all requests return 403 AccessDeniedException regardless of auth_type.
The gate against random direct hits is the X-Origin-Secret check in
src/http.ts; the bare URL is intentionally publicly invokable at the
IAM layer.
Empirically required: CloudFront-routed requests to a Function URL with
invoke_mode=RESPONSE_STREAM intermittently 403 with only the URL-form
permission. Adding InvokeFunction stabilizes the path.
… of tfvars

Replaces the variable + tfvars value with a data 'aws_ssm_parameter' lookup
on /grid-mcp/cloudfront-origin-secret (SecureString). The secret value
no longer lives on the operator's local disk; SSM is the source of truth.

Rotation procedure: aws ssm put-parameter --overwrite ... then terraform
apply. The new value flows through to both the Lambda ORIGIN_SECRET env
var and CloudFront's custom_header value atomically in the next apply.

Note: tfvars retains a commented marker so future operators know the
secret is intentionally absent (not forgotten).
Routes move from /mcp to /. Per-route middleware (rather than global
app.use) makes the /health bypass explicit and immune to future route
ordering changes. outputs.tf updated to reflect the new customer-facing
URL https://mcp.grid.lightspark.com (no /mcp suffix — the subdomain
names the service).
Brings 42 vulnerabilities (1 critical, 17 high, 19 moderate, 5 low) down
to 1 low. Notable: hono 4.12.5 → 4.12.18 patches 4 moderate CVEs
(bodyLimit bypass, JSX HTML injection, CSS injection, cache
cross-user leakage). Remaining low is in @anthropic-ai/mcpb's
build-time tooling tree (tmp <=0.2.3 symlink) — not runtime-reachable.

All 13 mcp-server tests pass; build succeeds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants