From 730872930004475ccef0fe0d99208d57c9e5527b Mon Sep 17 00:00:00 2001 From: Preetam Dwivedi Date: Thu, 4 Jun 2026 19:10:25 -0700 Subject: [PATCH] ci: run privileged workflows from main; pin actions; harden CI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit > ⚠️ **P0 — required before merge.** `STACK_REBASE_TOKEN` is currently a plain > repository secret, so any same-repo PR can exfiltrate it today by referencing > `${{ secrets.STACK_REBASE_TOKEN }}` in a modified/added workflow — it runs on > the PR, *before any review*, and log-masking is trivially bypassed. The > `pull_request_target` change here does **not** close that (an attacker uses a > different workflow to read the ambient secret). The actual fix is moving the > PAT into a `main`-only Environment so PR-triggered jobs cannot obtain it — see > item **#1** in "Required manual follow-up". Treat that as blocking. ## Summary ### Why? A review of `.github/workflows/` found one privilege-escalation hole and several supply-chain / least-privilege gaps. GitHub runs a `pull_request` workflow from the PR's HEAD ref, so a PR can modify the workflow that runs on it. `rebase-stack.yml` — which holds a repo-write PAT (STACK_REBASE_TOKEN) and triggers on `pull_request: closed` — was therefore exploitable: a contributor could rewrite the file in their own PR (dropping the `merged` guard), close it, and run arbitrary code with the PAT. Same-repo branch PRs (the stacked-PR model) DO receive secrets, so this was reachable by any contributor with push access or a compromised account. Privileged automation must run its definition from `main`, never from PR-controlled content. CI test jobs must still execute PR code — that is the purpose of a test pipeline — so the goal is to lock down *privileged* execution and *supply-chain* references, not to stop PR testing. The defense rests on an invariant: the untrusted-code path (`ci.yml` + its composite actions) carries no secrets and a read-only token, so there is nothing for a malicious PR to steal. ### What? - rebase-stack.yml: trigger `pull_request` -> `pull_request_target`, so GitHub always runs main's version of the file regardless of PR content. Safe here because the job never builds/runs PR code — it only replays commits via `git rebase` and validates the diff is byte-identical before force-pushing. Adds `environment: stack-rebase` to scope the PAT to a protected, main-only environment. - ci.yml: `persist-credentials: false` on every checkout (jobs execute untrusted PR code); PR-only `concurrency` cancellation (never cancels push/merge_group); new `workflow-security` job (actionlint + zizmor) wired into the required-checks gate; the gate's debug echo now passes `toJSON(needs)` via env to avoid in-script template expansion. - workflow-security: added a guard step that fails CI if `ci.yml` or its composite actions reference a repository secret (GITHUB_TOKEN allowlisted). This keeps the untrusted-code path secret-free, catching accidental reintroduction of a reachable secret. (It catches honest mistakes; the malicious case is covered by environment-scoped secrets + CODEOWNERS review.) - run-bazel-test composite action: pass `inputs.target` via env instead of interpolating `${{ }}` into the shell, closing a template-injection. - All third-party actions pinned to immutable commit SHAs (checkout, cache, actionlint, zizmor) so a moved tag cannot inject unapproved code. - .github/zizmor.yml: documents the intentional, reviewed exceptions (dangerous-triggers on the deliberate pull_request_target; artipacked on the two workflows that must persist credentials to push). - CODEOWNERS: require admin review for any change under `.github/`. ## Test Plan - ✅ `actionlint .github/workflows/*.yml` — no issues. - ✅ `zizmor --persona=regular .github/workflows/` (offline + online) — "No findings to report" (3 intentional ignores, documented in zizmor.yml). - ✅ Secrets guard: passes on the current secret-free path, and fires correctly when a `${{ secrets.* }}` reference is introduced. - ✅ All workflow/action YAML parses; no unpinned action refs remain. - Post-merge verification: open a PR that edits rebase-stack.yml (add an `echo PWNED` step, drop the `merged` guard) and CLOSE it without merging — confirm GitHub runs main's version (no PWNED, job skipped). Then merge a 2-PR stack and confirm children are still rebased/retargeted as before. ## Required manual follow-up (repo / org settings) These cannot be expressed in files and must be applied by a maintainer. - [ ] **🔴 P0 (blocking): Create the `stack-rebase` Environment**, restrict its deployment branches to `main`, and move `STACK_REBASE_TOKEN` into it so the PAT is unreachable from any PR-triggered job. Until this is done the PAT is exfiltratable today (see the callout above), and `rebase-stack.yml` (which now declares `environment: stack-rebase`) will not run. - [ ] **Settings → Actions → General → Fork pull request workflows from outside collaborators**: "Require approval for all external contributors" so PR code never auto-executes for new contributors. - [ ] **Settings → Actions → General → Workflow permissions**: default `GITHUB_TOKEN` to read-only; uncheck "Allow GitHub Actions to create and approve pull requests" unless required. - [ ] **Branch protection / ruleset on `main`**: require PR review, require review from Code Owners (so `/.github/` changes need admin approval), require the `Required Checks` status, block force-pushes. - [ ] **(Optional)** Restrict allowed actions to selected/verified creators with pinned SHAs. --- .github/CODEOWNERS | 7 +- .github/actions/run-bazel-test/action.yml | 7 +- .github/actions/setup/action.yml | 2 +- .github/workflows/ci.yml | 129 +++++++++++++++++++-- .github/workflows/rebase-stack.yml | 25 +++- .github/workflows/tag-go-pseudoversion.yml | 2 +- .github/zizmor.yml | 32 +++++ 7 files changed, 187 insertions(+), 17 deletions(-) create mode 100644 .github/zizmor.yml diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 4c66f352..2b95b103 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -1 +1,6 @@ -* @uber/submitqueue-admins @uber/submitqueue-maintainers @behinddwalls @sbalabanov +* @uber/submitqueue-admins @uber/submitqueue-maintainers @behinddwalls @sbalabanov + +# CI workflows, composite actions, and other automation are privileged +# (secrets, write tokens). Require admin review for any change under .github/. +# CODEOWNERS is last-match-wins, so this overrides the `*` rule for these paths. +/.github/ @uber/submitqueue-admins diff --git a/.github/actions/run-bazel-test/action.yml b/.github/actions/run-bazel-test/action.yml index 2a3cb6f7..a8c48b93 100644 --- a/.github/actions/run-bazel-test/action.yml +++ b/.github/actions/run-bazel-test/action.yml @@ -11,4 +11,9 @@ runs: steps: - name: Run Bazel test shell: bash - run: ./tool/bazel test ${{ inputs.target }} --test_output=streamed + # Pass the target through env rather than interpolating ${{ inputs.target }} + # directly into the script, so a composite-action input can never inject + # shell commands. + env: + TARGET: ${{ inputs.target }} + run: ./tool/bazel test "$TARGET" --test_output=streamed diff --git a/.github/actions/setup/action.yml b/.github/actions/setup/action.yml index 9cfe01db..0b72f08f 100644 --- a/.github/actions/setup/action.yml +++ b/.github/actions/setup/action.yml @@ -17,7 +17,7 @@ runs: using: composite steps: - name: Setup Bazel cache - uses: actions/cache@v4 + uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4 with: path: | ~/.cache/bazel-disk-cache diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 39533206..73d1e9d9 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -20,6 +20,13 @@ on: permissions: contents: read +# Cancel superseded runs for the same PR to save runner minutes. Never cancel +# in-progress runs for `push` (main) or `merge_group` — those must complete so +# the default branch and merge queue always have a definitive result. +concurrency: + group: ci-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} + cancel-in-progress: ${{ github.event_name == 'pull_request' }} + jobs: # --------------------------------------------------------------------------- # LINT (gofmt via Go SDK, yamlfmt via go run) @@ -29,7 +36,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - name: Run linters @@ -43,7 +54,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - name: Check module files are tidy @@ -60,7 +75,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - name: Build project @@ -77,7 +96,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - name: Run E2E tests @@ -88,7 +111,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - name: Run Gateway integration tests @@ -99,7 +126,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - name: Run Orchestrator integration tests @@ -113,7 +144,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - uses: ./.github/actions/run-bazel-test with: @@ -124,7 +159,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - uses: ./.github/actions/run-bazel-test with: @@ -135,7 +174,11 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - uses: ./.github/actions/run-bazel-test with: @@ -149,12 +192,71 @@ jobs: if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + # This job executes untrusted PR code (make build/test/lint). Don't + # leave the GITHUB_TOKEN in the workspace git config while it runs. + persist-credentials: false - uses: ./.github/actions/setup - uses: ./.github/actions/run-bazel-test with: target: //test/integration/submitqueue/core/consumer/... + # --------------------------------------------------------------------------- + # WORKFLOW SECURITY LINT + # + # Guards against regressions in the workflows themselves: actionlint checks + # general validity; zizmor audits for GitHub Actions security smells + # (dangerous triggers, unpinned `uses:`, credential persistence, template + # injection). Keeps the pull_request_target / SHA-pinning hardening from + # silently eroding in future edits. + # --------------------------------------------------------------------------- + workflow-security: + name: Workflow Security Lint + if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }} + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + with: + persist-credentials: false + - name: actionlint + uses: raven-actions/actionlint@205b530c5d9fa8f44ae9ed59f341a0db994aa6f8 # v2.1.2 + - name: zizmor + uses: zizmorcore/zizmor-action@5f14fd08f7cf1cb1609c1e344975f152c7ee938d # v0.5.6 + with: + # Pin the zizmor TOOL version (distinct from the action tag above) for + # reproducible audits matching local validation. + version: "1.25.2" + # Fail the job on findings without requiring GitHub Advanced Security + # / SARIF upload (which needs security-events: write and degrades on + # fork PRs). Keeps the gate self-contained. + advanced-security: false + + # ci.yml and the composite actions it calls run untrusted PR code under + # the `pull_request` trigger. A repository secret referenced anywhere on + # that path is reachable by a malicious PR and can be exfiltrated on the + # PR run (before any review), so this path must stay secret-free — route + # any secret-bearing step through a SEPARATE trusted workflow + # (workflow_run / pull_request_target) that does not execute PR code. + # + # This guard catches ACCIDENTAL reintroduction by honest contributors; a + # malicious actor controlling ci.yml could delete the guard itself, so the + # real defenses remain (a) secrets scoped to a main-only Environment so a + # PR-triggered job cannot obtain them, and (b) CODEOWNERS review on + # .github/. GITHUB_TOKEN (least-privilege, read-only here) is allowlisted. + - name: Guard — no repository secrets on the untrusted-code path + run: | + hits="$(grep -rnE '\$\{\{[^}]*secrets\.' \ + .github/workflows/ci.yml .github/actions \ + | grep -vE 'secrets\.GITHUB_TOKEN' || true)" + if [ -n "$hits" ]; then + echo "::error::Repository secret referenced on the untrusted-code CI path (ci.yml / composite actions):" >&2 + echo "$hits" >&2 + echo "Move secret-bearing steps to a separate trusted workflow that does not run PR code." >&2 + exit 1 + fi + echo "OK: no repository secrets referenced in ci.yml or its composite actions." + # --------------------------------------------------------------------------- # REQUIRED CHECKS GATE # @@ -179,12 +281,17 @@ jobs: - queue-integration-test - storage-integration-test - consumer-integration-test + - workflow-security steps: - name: Fail if any required check did not succeed if: ${{ contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled') || contains(needs.*.result, 'skipped') }} + # Pass the job-results context via env (not inline ${{ }} in the script) + # so there is no template expansion inside the run block. + env: + NEEDS_JSON: ${{ toJSON(needs) }} run: | echo "One or more required checks did not succeed:" >&2 - echo '${{ toJSON(needs) }}' >&2 + echo "$NEEDS_JSON" >&2 exit 1 - name: All required checks passed diff --git a/.github/workflows/rebase-stack.yml b/.github/workflows/rebase-stack.yml index 6a8ab80e..b4ec5c74 100644 --- a/.github/workflows/rebase-stack.yml +++ b/.github/workflows/rebase-stack.yml @@ -37,8 +37,26 @@ name: Rebase Stacked PRs +# SECURITY: this workflow is triggered by `pull_request_target`, NOT +# `pull_request`. The distinction is critical because this job holds a +# privileged, repo-write PAT (STACK_REBASE_TOKEN). +# +# - `pull_request` runs the workflow file from the PR's HEAD ref. +# A contributor could edit THIS file in their own PR +# (e.g. drop the `merged` guard), close the PR, and +# run arbitrary code with the PAT. Same-repo branch +# PRs (the stacked-PR model) DO receive secrets, so +# this is exploitable — a secret-exfiltration hole. +# - `pull_request_target` always runs the workflow file from the DEFAULT +# branch (main). PR edits to this file are ignored, +# so the privileged logic is fixed at main's version. +# +# `pull_request_target` is safe HERE specifically because the job never builds +# or executes PR code: it only replays commits with `git rebase` (which runs +# nothing from the tree) and validates the resulting diff is byte-identical +# before force-pushing. It does not run `make`, tests, or any PR script. on: - pull_request: + pull_request_target: types: - closed @@ -52,8 +70,11 @@ jobs: name: Rebase Stack if: github.event.pull_request.merged == true runs-on: ubuntu-latest + # Scope the privileged PAT to a protected Environment restricted to `main` + # so the secret cannot be used from any other ref/context. + environment: stack-rebase steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 with: # Fetch full history so rebase --onto works correctly. fetch-depth: 0 diff --git a/.github/workflows/tag-go-pseudoversion.yml b/.github/workflows/tag-go-pseudoversion.yml index 229567e5..4859e4fc 100644 --- a/.github/workflows/tag-go-pseudoversion.yml +++ b/.github/workflows/tag-go-pseudoversion.yml @@ -24,7 +24,7 @@ jobs: name: Create and push pseudo-version tag runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 with: fetch-depth: 0 diff --git a/.github/zizmor.yml b/.github/zizmor.yml new file mode 100644 index 00000000..af883a72 --- /dev/null +++ b/.github/zizmor.yml @@ -0,0 +1,32 @@ +# zizmor (GitHub Actions security auditor) configuration. +# +# Every ignore below is an INTENTIONAL, reviewed exception with a concrete +# justification — not a blanket mute. New findings outside these are expected +# to fail the `workflow-security` CI job. +rules: + # rebase-stack.yml deliberately uses `pull_request_target` so the privileged + # workflow definition (which holds the STACK_REBASE_TOKEN PAT) is always taken + # from `main`, never from PR-controlled content. It never builds or runs PR + # code — only `git rebase` replay + byte-identical diff validation before + # force-push — so the usual pull_request_target code-execution risk does not + # apply. See the SECURITY comment block at the top of that workflow. + dangerous-triggers: + ignore: + - rebase-stack.yml + # These workflows MUST persist the checkout credentials because they push: + # - rebase-stack.yml force-pushes rebased branches using STACK_REBASE_TOKEN + # - tag-go-pseudoversion.yml pushes pseudo-version tags using the default token + # Neither uploads build artifacts, so persisted credentials cannot leak via + # the artipacked vector. CI jobs that DO run untrusted PR code set + # persist-credentials: false in ci.yml. + artipacked: + ignore: + - rebase-stack.yml + - tag-go-pseudoversion.yml + # dependabot.yml intentionally disables scheduled version-update PRs + # (open-pull-requests-limit: 0) and relies only on security updates, which + # should be applied promptly. A version-update cooldown would add nothing + # (no version PRs are opened) and we do not want to delay security fixes. + dependabot-cooldown: + ignore: + - dependabot.yml