Skip to content

edge3: bind team_name into worker JWT (defense-in-depth for experimental multi-team)#67397

Open
omkhar wants to merge 2 commits into
apache:mainfrom
omkhar:omkhar/edge3-bind-team-name-in-jwt
Open

edge3: bind team_name into worker JWT (defense-in-depth for experimental multi-team)#67397
omkhar wants to merge 2 commits into
apache:mainfrom
omkhar:omkhar/edge3-bind-team-name-in-jwt

Conversation

@omkhar
Copy link
Copy Markdown
Contributor

@omkhar omkhar commented May 24, 2026

Supplements #66718 (which clarified WorkerQueuesBase.team_name is an experimental hint).

Defense-in-depth for the experimental edge3 multi-team feature. Worker team_name is currently sent in request bodies only and trusted by the server. This PR binds team_name into the JWT at issue time and rejects requests where the body's team_name disagrees with the JWT's. Legacy pre-team-claim workers (no team_name claim in the JWT) keep the current body-only path for backwards compatibility.

What changes

  • worker_api/auth.py — include team_name in the issued JWT claims at registration; jwt_token_authorization returns the validated payload (with the JWT-bound team_name) for downstream comparison.
  • worker_api/routes/jobs.py, worker_api/routes/worker.py — compare body.team_name to the JWT-bound team_name; reject with 403 on mismatch; fall back to body-only when no JWT claim (legacy backcompat).
  • cli/api_client.py — small alignment so the worker-side path produces a team_name consistent with what the server now binds.
  • Tests: 4-case validation in test_jobs.py / test_worker.py
    1. Cross-team rejected (403).
    2. JWT team used when body omits team_name.
    3. Legitimate match succeeds.
    4. Legacy backcompat path (no JWT claim) still works.

What does NOT change

The Execution API's team-isolation contract is unchanged. It remains documented as experimental and is not enforced cross-team — see airflow-core/docs/security/workload.rst section "No team-level isolation in Execution API (experimental multi-team feature)". This PR closes a specific JWT-vs-body-mismatch gap ahead of the future team-isolation work referenced in that document.

No new APIs, no behavior change for single-team / no-team setups, no schema migrations.

Notes

@boring-cyborg boring-cyborg Bot added area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 labels May 24, 2026
omkhar and others added 2 commits May 23, 2026 23:14
…tal multi-team)

Defense-in-depth for the experimental edge3 multi-team feature. Worker team_name is currently sent in request bodies only and trusted by the server. This change binds team_name into the JWT at issue time and rejects requests where the body's team_name disagrees with the JWT's. Legacy pre-team-claim workers (no team_name claim in the JWT) keep the current body-only path for backwards compatibility.

The Execution API's team-isolation contract is unchanged. It is still documented as experimental and not enforced cross-team (see airflow-core/docs/security/workload.rst section 'No team-level isolation in Execution API'). This patch closes a specific JWT-vs-body-mismatch gap ahead of the future team-isolation work referenced in that document.

Tests: 4-case validation (cross-team rejected with 403, JWT-team used when body omits team_name, legitimate match succeeds, legacy backcompat path).
Signed-off-by: Omkhar Arasaratnam <omkhar@gmail.com>
Apply ruff-format to providers/edge3/tests/unit/edge3/worker_api/
routes/test_jobs.py and test_worker.py. Whitespace-only change
to satisfy CI's static-checks job (Run 'ruff format' hook):
multi-arg fetch(...) and PUT/DELETE call sites get
one-arg-per-line formatting. No semantic change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@omkhar omkhar force-pushed the omkhar/edge3-bind-team-name-in-jwt branch from 4c23d36 to a14693f Compare May 24, 2026 03:15
@omkhar
Copy link
Copy Markdown
Contributor Author

omkhar commented May 24, 2026

The one failing job (Integration core otel / test_export_legacy_metric_names) looks like a pre-existing flake on main — the identical failure hit apache/main itself on Tests (AMD) run 26342500539 at 2026-05-23 20:16 for SHA 16ad4794 (the same HEAD this PR was rebased onto), and that same SHA then passed Tests (ARM) at 21:43 and CodeQL at 02:14. Nothing in this PR touches OTEL paths.

Could a maintainer kick off a rerun of just the failed job? Happy to push an empty commit instead if that's preferred. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant