Skip to content

PDF: throw on reads from an unauthenticated encrypted file#528

Merged
andiwand merged 1 commit into
mainfrom
pdf-unauthenticated-read-guard
Jun 14, 2026
Merged

PDF: throw on reads from an unauthenticated encrypted file#528
andiwand merged 1 commit into
mainfrom
pdf-unauthenticated-read-guard

Conversation

@andiwand

@andiwand andiwand commented Jun 14, 2026

Copy link
Copy Markdown
Member

Stacked on #527.

Reading an encrypted file without an installed decryptor previously served undecrypted bytes silently. This guards read_object / read_object_stream so they throw instead.

Changes

  • New UnauthenticatedReadError exception (no existing exception covered this — FileEncryptedError/NotEncryptedError are different concepts).
  • Both read paths throw it when is_encrypted() && !is_authenticated().
  • New DocumentParser::is_authenticated() (encrypted and a decryptor installed).
  • m_is_encrypted is now set only after the /Encrypt dictionary is resolved during construction. That resolution runs before any decryptor exists, so setting the flag earlier would trip the new guard on every encrypted file's open.

Why it's safe

  • Construction (read_trailer_chain, xref streams) runs before m_is_encrypted is set → guard passes, xref streams stay raw (ISO 32000-1 7.5.8.2).
  • Compressed objects route through a used-entry read and read_object_stream, so they're covered transitively.
  • The normal render path authenticates via create_parser(m_decryptor) before parse_document(), so it's unaffected.
  • The only newly-throwing path is bypassing auth, which previously produced garbage.

Test

DocumentParser.read_without_authentication_throws opens the public RC4 fixture without authenticating, asserts is_encrypted() / !is_authenticated(), and expects parse_document() to throw UnauthenticatedReadError. Uses only the public fixture, so it runs in CI without the private submodule.

Base automatically changed from pdf-encryption-followups to main June 14, 2026 08:10
Reading an encrypted file without an installed decryptor used to silently
serve undecrypted bytes. Guard read_object/read_object_stream so they throw
the new UnauthenticatedReadError instead. m_is_encrypted is now set only
after the /Encrypt dictionary is resolved during construction, since that
read predates any decryptor and would otherwise trip the guard.

Adds is_authenticated() and a test that an encrypted file reports as
encrypted-but-unauthenticated and throws on parse_document().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@andiwand andiwand force-pushed the pdf-unauthenticated-read-guard branch from 9559c49 to a0f27b6 Compare June 14, 2026 08:12
@andiwand andiwand marked this pull request as ready for review June 14, 2026 08:12
@andiwand andiwand enabled auto-merge (squash) June 14, 2026 08:12
@andiwand andiwand merged commit 27ffb0e into main Jun 14, 2026
11 checks passed
@andiwand andiwand deleted the pdf-unauthenticated-read-guard branch June 14, 2026 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant