feat: @actions/cache optional path validation during restore#2414
feat: @actions/cache optional path validation during restore#2414jasongin wants to merge 11 commits into
@actions/cache optional path validation during restore#2414Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in integrity gate to @actions/cache restore that validates tar archive entry paths (and link targets) against the user-declared cache paths prior to extraction, surfacing violations as warnings or as a hard CacheIntegrityError depending on mode.
Changes:
- Introduces
DownloadOptions.pathValidation('off' | 'warn' | 'error') and plumbs it throughrestoreCache*()toextractTar(). - Adds an in-process archive listing/validation pass (
tarParser + custom path/link validation) and a newCacheIntegrityErrorfor parse/violation failures. - Updates package metadata (version bump,
tardependency, Node engine) and adds comprehensive unit/integration tests + test plan doc.
Show a summary per file
| File | Description |
|---|---|
| packages/cache/src/options.ts | Adds pathValidation option with defaults and validation in getDownloadOptions(). |
| packages/cache/src/internal/tar.ts | Runs optional pre-extraction validation and reports/throws based on mode. |
| packages/cache/src/internal/pathValidation.ts | Implements allowed-root derivation, entry/link validation, and violation formatting. |
| packages/cache/src/internal/listAndValidate.ts | Streams archives through tar Parser (plus zstd decompression) to collect violations. |
| packages/cache/src/internal/cacheIntegrityError.ts | Defines CacheIntegrityError + error codes for integrity failures. |
| packages/cache/src/cache.ts | Exposes CacheIntegrityError publicly and forwards pathValidation/paths into extraction. |
| packages/cache/package.json | Bumps version, adds tar dep, adds engines.node, adjusts exports. |
| packages/cache/package-lock.json | Updates lockfile for new version and dependency graph. |
| packages/cache/docs/path-validation-test-plan.md | Documents test strategy/coverage for the new feature. |
| packages/cache/tests/tarPathValidation.test.ts | Adds integration tests asserting extract behavior across modes. |
| packages/cache/tests/restoreCache.test.ts | Updates assertions for new extractTar argument plumbing (v1). |
| packages/cache/tests/restoreCacheV2.test.ts | Updates assertions for new extractTar argument plumbing (v2). |
| packages/cache/tests/pathValidation.test.ts | Adds extensive unit tests for root derivation + entry/link validation. |
| packages/cache/tests/options.test.ts | Validates new default/override behavior for pathValidation. |
| packages/cache/tests/listAndValidate.test.ts | Adds real-archive tests for gzip/zstd parsing + validation behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Files not reviewed (1)
- packages/cache/package-lock.json: Language not supported
- Files reviewed: 14/15 changed files
- Comments generated: 3
| * - 'error': surfaced as `CacheIntegrityError` with `code: 'PARSE_ERROR'` | ||
| * and extraction does not run. | ||
| * | ||
| * @default 'off' |
There was a problem hiding this comment.
This feels like a security bug fix.
I'm wondering if we should default to error. We could major-version the package if we're worried about a breaking change.
There was a problem hiding this comment.
There are consumers of this package other than actions/cache (primarily actions/setup-*), so I didn't want to risk impacting them automatically. Even with a major version update, they might not realize the implications of taking the new major version. So I think it's safer to require consumers to opt in explicitly. Or at best 'warn' could be the default with a major version update.
The linked PR actions/cache#1761 defaults to warn and I proposed updating the default to error in a major version update of the action (not the toolkit package). We should consider similar updates to actions/setup-* repos.
There was a problem hiding this comment.
Even with a major version update, they might not realize the implications of taking the new major version
Breaking changes is what semver communicates.
We should have safe defaults. Is there a legitimate scenario where an archive entry should escape the declared cache paths?
There was a problem hiding this comment.
If this was a simple validation then I might agree. But given the complexity of the implementation, I think there is some risk of edge cases with certain platforms or archive contents. So I'm proposing to leave the validation disabled by default (opt-in) until we have more data proving it is solid in many real-world scenarios.
There was a problem hiding this comment.
If only we had a beta release cycle for these, this could have been warn in the beta and error in the GA. I'm fine with warn and optional for now, we don't have the capacity to deal with a deluge of breaking reports, even with a major version bump.
|
I am considering switching to path validation using system tar, which would avoid some kinds of attacks that could bypass node-tar's path validation. However there's a compatibility challenge: BSD tar (macOS without gtar, Windows tar.exe, some self-hosted runners) lacks sufficient options to list files with quoted/escaped paths and full link targets. We could consider requiring GNU tar - it is already installed on hosted runners but would be a compatibility problem for some self-hosted runners. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Added a new test suite for parser-differential bypass detection in tar archives. - Introduced length-correct PAX extended-header re-parsing to catch discrepancies between node-tar and system tar. - Enhanced the listAndValidate function to return approved names alongside violations for better extraction control. - Implemented checks for unsafe characters and glob metacharacters in entry paths. - Updated the tar extraction logic to utilize an allow-list for approved entries when path validation is in error mode. - Added utility functions for writing temporary allow-list files for system tar extraction.
dae5257 to
6e8379e
Compare
|
I pushed a major update that implements a revised approach to path validation. The main changes are the PAX header defense and use of an allow-list generated by node-tar passed to system-tar. They are explained in the updated PR description. |
| type: string, | ||
| archivePath = '' | ||
| archivePath = '', | ||
| allowListPath = '' |
There was a problem hiding this comment.
Hmm.. in the backlog we had plans to implement streamed extractions like we do in Artifacts to speed up the restoration process. How is this going to play out in the streaming scenario? Are they gonna have to be mutually exclusive features?
| // root as the sole allowed root. This still catches archives that try | ||
| // to escape the workspace via `..` or absolute paths. | ||
| if (allowedRoots.length === 0) { | ||
| allowedRoots = [workingDirectory] |
There was a problem hiding this comment.
Doesn't this mean that the workingDirectory is the new exploitation target? Especially with this definition: return process.env['GITHUB_WORKSPACE'] ?? process.cwd()
| @@ -0,0 +1,297 @@ | |||
| /** | |||
There was a problem hiding this comment.
This is a compatibility nightmare in the making. We don't have the bandwidth to maintain this. Making sure this is compatible across ALL the platforms this library runs on is going to be very difficult to put it mildly.
| * decompression failure, truncated stream, etc.). The caller is responsible | ||
| * for translating that into a CacheIntegrityError with code `PARSE_ERROR`. | ||
| */ | ||
| export async function listAndValidate( |
There was a problem hiding this comment.
Did we benchmark this? What's the cost of running this?
I'd be curious about the numbers on the various runners we already offer (before thinking of others)
Optional path validation during cache restore
New
restoreCache()optionA new opt-in
DownloadOptions.pathValidationon@actions/cachestreams a downloaded cache archive through an in-process validator before handing it to systemtarfor extraction. Each entry's path (and, for symlinks/hardlinks, its target) must resolve under one of the declared cachepaths— anything that escapes (../..., absolute, UNC, drive-relative, NUL bytes, unsupported entry types) is caught by the validator.'off'— legacy behavior, validation is skipped.'warn'— validate; on violations, log viacore.warning/core.debugand extract anyway (legacy extraction, no allow-list).'error'— validate; on violations throwCacheIntegrityError(code: 'PATH_VIOLATION', withviolationsattached). No files are extracted. On a clean archive, extraction is additionally restricted to exactly the members the validator approved (see below).Architecture: node-tar for listing, system
tarfor extraction — made sound with an approved-member allow-listThis is the biggest design call in the PR, and the area most changed in the latest revision.
Listing/validation uses node-tar v7's
Parserin-process.tar -tvtext output that varies by GNU/BSD/Windows.TAR_BAD_ARCHIVE/TAR_ENTRY_INVALID/TAR_ENTRY_ERROR/TAR_ABORTas discrete codes we promote to parse errors.zstdbinary for the same--long=30window behavior the extract path uses.Extraction stays on system
tar.tar -P(--absolute-names) option must be used when extracting cache archives, because cache paths frequently live outside$GITHUB_WORKSPACE(~/.npm,~/.cargo,~/.gradle/caches, etc.).cacheUtils.resolvePathsrewrites every path relative to the workspace, so anything outside becomes../../..., which GNU tar refuses without-P, and node-tar's extractor refuses by default — swapping it in wholesale would break essentially every common dependency cache.--force-local,--delay-directory-restore,zstdmt/zstd -T0, the BSD-tar-on-Windows-with-zstd workaround). Re-implementing all of that on node-tar's writer is a much larger, riskier change.Closing the listing↔extraction gap (new). The original design relied on an informal argument that "anything the validator approves, system tar also accepts." That argument is weak against path-channel parser differentials: a crafted entry (notably via PAX extended headers) can make node-tar compute a safe-looking path while system
tarplaces the bytes somewhere else entirely. Rather than move validation onto systemtar— which would require GNU tar everywhere and break BSD-tar / Windowstar.exe/ many self-hosted runners — this revision keeps node-tar for listing and instead constrains the extractor to the validator's approved set:'error'mode with a clean archive,listAndValidatenow returns the exact member names node-tar derived (approvedNames) alongside the violations.0o600, inos.tmpdir(), randomized name) and passed to systemtarvia--null --no-recursion -T <file>.--nullMUST precede-Ton GNU tar.--no-recursionprevents an approved directory from implicitly pulling in unapproved children.--no-wildcards --no-wildcards-match-slash --anchoredas defense in depth; bsdtar lacks these, but glob metacharacters are already rejected during validation, so itsfnmatch()-based-Tmatching degrades to exact-name matching.finally.The effect: a member that system
tarwould extract to a different (escaped) path than node-tar computed simply isn't on the list and is never extracted. The parser differential becomes a no-op instead of a trusted assumption.'warn'mode deliberately does not use the allow-list — on any violation it falls back to extracting everything, matching legacy behavior.Performance tradeoff. Validation is a full pre-pass over the archive: decompress + parse headers (entry bodies are drained, never written to disk), then system tar decompresses a second time to extract. Roughly a 2× decompress cost on the archive stream, no extra disk I/O at the destination. The extra pass — and the allow-list — are skipped entirely when validation is
'off'.Parser-differential & PAX header defense (new)
Beyond the allow-list, this revision hardens the validator itself against attacks that exploit the gap between node-tar's parsing and that of real extractors:
split('\n')(its own source notesXXX Values with \n in them will fail this). A PAX record is actually length-prefixed ("<len> <key>=<value>\n"), and GNU tar / libarchive consume exactly<len>bytes — so a value containing an embedded"\n<len> path=<other>\n"desynchronizes node-tar from every real extractor. The re-parser re-reads each captured PAX body byte-accurately and cross-checkspath/linkpathagainst node-tar's view. Disagreement →PAX_DESYNC; a body node-tar treated as PAX that the re-parser can't fully account for →PAX_PARSE_FAIL.SCHILY./GNU./LIBARCHIVE./POSIX.namespaces) →PAX_UNKNOWN_KEY, so a future placement-affecting PAX extension can't silently slip past.UNSUPPORTED_TYPEvia anignoredEntrylistener, closing a class of "the validator never saw it, but tar might" bypasses.\n/\0→UNSAFE_CHAR; paths containing* ? [ ]→GLOB_METACHAR. The glob check is also what lets the bsdtar-Tallow-list degrade safely to exact-name matching.maxMetaEntrySizeof 1 MiB,MAX_PENDING_METAcap) so a malicious archive can't exhaust memory via the validation pass.Other design notes
'quarantine').paths.deriveAllowedRootsmirrorscacheUtils.resolvePaths: tilde / env-var expansion, glob-prefix stripping, etc.prepareAllowedRoots()pre-bakes per-root normalize/lowercase/trailing-sep forms once per archive, dropping the per-entry hot loop from O(N · R) to O(N + R)..catch(() => undefined)on the pipeline / exit promises to suppress late rejections,try/finallySIGTERMif the child is still running on early exit.Violation codes
ABSOLUTE_PATH,UNC_PATH,NUL_BYTE,OUTSIDE_ROOTS,LINK_OUTSIDE_ROOTS,UNSUPPORTED_TYPE,PAX_DESYNC,PAX_PARSE_FAIL,PAX_UNKNOWN_KEY,UNSAFE_CHAR,GLOB_METACHAR.Known limitations
caseInsensitiveFs()) is a platform-level guess. On case-sensitive APFS or per-dir-case-sensitive NTFS it loosens (fewer violations) rather than tightens.-Tdefense-in-depth flags (--no-wildcardsetc.) are GNU-only; on BSD/Windows tar the allow-list relies on the validator's prior rejection of glob metacharacters to keep-Tmatching exact.Tests
All path-validation suites pass: 176 passed, 3 skipped (zstd suites
describe.skipwhen nozstdbinary is onPATH). New/updated coverage in this revision:__tests__/pax-reparse.test.ts— unit tests for the length-correct PAX re-parser and cross-check (PAX_DESYNC/PAX_PARSE_FAIL/PAX_UNKNOWN_KEY), 100% line coverage of pax-reparse.ts.__tests__/tarPathValidationAttacks.test.ts— raw-tar attack builders covering the F1–F5 parser-differential bypass classes,UNSAFE_CHAR/GLOB_METACHAR/ clean-PAX / meta-flood cases, plus end-to-end extraction tests against the real systemtar.__tests__/tarPathValidation.test.ts— extended for the'error'-mode allow-list: asserts the--null --no-recursion -Targs, the NUL-separated file contents, and temp-file cleanup.__tests__/listAndValidate.test.ts— updated for the new{violations, approvedNames}return shape.See path-validation-test-plan.md for the full test-coverage description.
Out of scope
tarfor extraction (see above).actions/cacheupdateactions/cache#1761 is a related PR to actually use this path validation when restoring from cache. The path validation mode will be a new input to the action; initially the default will be
'warn', then we can consider changing the default to'error'in a major version update.