Skip to content

test(cli-e2e): capability probes + capability-based live gating#5749

Draft
avallete wants to merge 6 commits into
developfrom
test/cli-e2e-capability-probes
Draft

test(cli-e2e): capability probes + capability-based live gating#5749
avallete wants to merge 6 commits into
developfrom
test/cli-e2e-capability-probes

Conversation

@avallete

@avallete avallete commented Jul 1, 2026

Copy link
Copy Markdown
Member

Adds a declarative way to mark apps/cli-e2e live tests by the runtime capabilities they need, plus one minimal capability probe per category. Groundwork for running the live e2e suite against the CLI-in-supabox model (supabox#106): the probes systematically reveal which capabilities supabox can/can't provide.

What's here

  • src/tests/env.ts — a Capability type (docker / internet / external-tool) and PROVIDED_CAPABILITIES, read from CLI_E2E_CAPABILITIES (comma list) with per-target defaults: staging provides everything (the oracle), supabox starts empty and is opened up as it gains support.
  • src/tests/live/live-context.tstestLiveRequires(caps)testLive or testLive.skip, so a test skips (not fails) when the target can't provide a required capability.
  • src/tests/live/capabilities.live.e2e.test.ts — one probe per category, each forcing its capability so a skip/red is a precise gap signal:
    • C1 mgmt-api only — projects list includes the project
    • C2 docker, offline — db push + db pull (pull's shadow DB requires DockerStart; pre-built images)
    • C3 external tool — db dump remote schema via native pg_dump (SUPABASE_DB_USE_LOCAL_TOOLS)
    • C4 internet — deploy a jsr.io-importing function (server-side bundler) + invoke
    • C5 docker + internet — --use-docker deploy of the same jsr function + invoke

Reuses the existing live harness (live-setup.ts provisioning, deploy-e2e-jsr / deploy-e2e-mode-docker fixtures, invoke.ts); no new fixtures.

How it's used

  • Staging (oracle): CLI_E2E_MODE=live CLI_E2E_TARGET_ENV=staging → all five run and must pass, proving the probes are sound.
  • supabox / Antithesis: the same file runs whatever the target declares via CLI_E2E_CAPABILITIES; anything green-on-staging but skipped/red on supabox is a concrete gap for the supabox-side work to close (delegated).

Inert on normal PR/replay runs — testLive* skips unless CLI_E2E_MODE=live.

Adds a declarative way to mark live e2e tests by the runtime capabilities they
need (docker / internet / external-tool) and skip them when the target env can't
provide it — so one suite runs against staging (all capabilities, the oracle),
supabox (whatever it currently supports), and Antithesis (offline subset), each
skipping only what it genuinely can't do.

- env.ts: `Capability` type + `PROVIDED_CAPABILITIES`, from `CLI_E2E_CAPABILITIES`
  with per-target defaults (staging = all; supabox = none until opened up).
- live-context.ts: `testLiveRequires(caps)` → `testLive` or `testLive.skip`.
- capabilities.live.e2e.test.ts: one minimal probe per category, each forcing its
  capability so a failure/skip is a precise supabox-gap signal —
  C1 mgmt-api (projects list), C2 docker/offline (db push+pull shadow),
  C3 external-tool (db dump via native pg_dump), C4 internet (deploy jsr fn),
  C5 docker+internet (--use-docker deploy of a jsr fn). Reuses existing fixtures.

Staging is the oracle: probes green there prove soundness, so any supabox
skip/red is a real gap for the CLI-in-supabox work (supabox#106) to close.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avallete avallete requested a review from a team as a code owner July 1, 2026 09:21
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Supabase CLI preview

npx --yes https://pkg.pr.new/supabase/cli/supabase@26e70380d24ba3ac920b987842da439f25f25c6a

Preview package for commit 26e7038.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26e70380d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +138 to +140
export function testLiveRequires(required: readonly Capability[]): typeof testLive {
const missing = required.filter((capability) => !PROVIDED_CAPABILITIES.has(capability));
return missing.length === 0 ? testLive : testLive.skip;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate the rest of the live suite by capabilities

In a supabox/Antithesis run with missing Docker or internet, this helper only skips tests that opt into testLiveRequires; vitest.live.config.ts:10 still includes every *.live.e2e.test.ts, and existing live files still call testLive directly (for example functions-deploy.live.e2e.test.ts:20 runs the matrix containing --use-docker, and db-sync.live.e2e.test.ts:13 runs db push/db pull). Those tests will execute and fail instead of being skipped, so CLI_E2E_CAPABILITIES does not actually make the live suite run only supported cases. Please either convert the existing live tests to capability requirements or limit the supabox probe run to the capability file.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed (8bcc653) — tagged the clearly docker/internet existing tests: db-sync requires docker, functions deploy --use-docker requires docker, and deploy-all requires internet. Full suite-wide gating of the env-dependent db dump and the data-plane-provisioned tests (storage, pooler) is coupled to the global-setup change in the sibling comment and lands with the supabox-enablement follow-up; until then capability-limited runs target the probe file + the tagged subset. Leaving open to track that.

🤖 Addressed by Claude Code

Comment thread apps/cli-e2e/src/tests/live/capabilities.live.e2e.test.ts Outdated
Comment thread apps/cli-e2e/src/tests/live/capabilities.live.e2e.test.ts Outdated
Comment thread apps/cli-e2e/src/tests/env.ts Outdated
Comment thread apps/cli-e2e/src/tests/live/capabilities.live.e2e.test.ts Outdated
Comment thread apps/cli-e2e/src/tests/live/capabilities.live.e2e.test.ts
@avallete avallete marked this pull request as draft July 1, 2026 09:30
avallete and others added 5 commits July 1, 2026 11:51
- env.ts: reject unknown CLI_E2E_CAPABILITIES tokens (a typo like
  `external_tools` silently skipped tests and left the run green).
- C3 probe: pass SUPABASE_DB_USE_LOCAL_TOOLS=1 so `db dump` exercises the
  native pg_dump path (external-tool), not the default container path.
- C2 probe: unique migration timestamp so it can't collide with db-sync's on
  the shared per-run project.
- C5 probe: deploy a distinct slug (deploy-e2e-npm) so the invoke proves the
  --use-docker deploy produced the function, not C4's earlier server-bundled one.
- Tag the existing docker/internet live tests so CLI_E2E_CAPABILITIES gates them
  too: db-sync (docker), functions deploy --use-docker (docker), deploy-all
  (internet).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extends the capability set with the data-plane axes and tags every live test
with its full requirements, so CLI_E2E_CAPABILITIES gates the entire suite (not
just the probes):

- Add `database` (project Postgres reachable via the pooler) and `storage` (the
  Storage API) to the capability set; staging provides all, supabox opts in.
- Tag data-plane tests: database db-stats/migration-list → [database], db dump →
  [database, docker]; db-sync → [database, docker]; gen-types → [database, docker];
  storage → [database, storage]; capability probes C2/C3 gain [database].
- Management-API-only tests (projects, link, secrets, branches,
  functions-lifecycle) need no extra capability and stay on the `testLive`
  baseline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eline

The project data plane (Postgres via the pooler, the Storage API) is always
present on a live target, so it isn't a capability a test opts into. Revert to
the three runtime capabilities (docker / internet / external-tool) and re-tag:

- storage, database db-stats/migration-list → back to the bare `testLive`
  baseline (pooler connection needs no runtime capability).
- database db dump, gen-types, db-sync, probe C2 → `docker` only (db pull's
  shadow DB / gen-types' postgres-meta / db dump's pg_dump container); probe
  C3 → `external-tool` only.

db push+pull keeps `docker` because `db pull` starts a shadow postgres container
(PrepareRawShadow → DockerStart), not because of the --db-url connection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the C2 (docker) probe before C3 (external-tool) so the probes read in
numeric order.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nitize

Validated the capability probes against real staging (all 5 pass); two harness
flakes surfaced and are fixed:

- createStorageBucket retries on 400 TenantNotFound: the storage tenant is
  registered asynchronously, after the project reports ACTIVE_HEALTHY, so the
  first bucket call races it and aborted global setup (upstreams the supabox
  cli-storage-bucket-retry patch into staging-project.ts).
- Sanitize the workspace temp-dir name: a test title with ':' or spaces leaks
  into the temp path, which the cli mounts as a Docker volume for docker-backed
  commands (functions bundling, db diff shadow), breaking the src:dst:mode spec
  ("too many colons"). Collapse non-alphanumerics so any test name is volume-safe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant