Skip to content

Fix Zstd.decompress dropping all but the first of concatenated frames#139

Open
Watson1978 wants to merge 1 commit into
SpringMT:mainfrom
Watson1978:fix/decompress-concatenated-frames
Open

Fix Zstd.decompress dropping all but the first of concatenated frames#139
Watson1978 wants to merge 1 commit into
SpringMT:mainfrom
Watson1978:fix/decompress-concatenated-frames

Conversation

@Watson1978

Copy link
Copy Markdown
Contributor

Fixes #138

Problem

Zstd.decompress only decoded the first frame of concatenated zstd frames and silently dropped the rest. The zstd format explicitly supports frame concatenation (e.g. cat a.zst b.zst | zstd -d yields the concatenation of both), and ZSTD_decompress / ZSTD_decompressStream decode all frames, so the previous behavior diverged from the format.

a = Zstd.compress("Hello, ")
b = Zstd.compress("World!")
Zstd.decompress(a + b)
# => "Hello, "          (before)
# => "Hello, World!"    (after)

Cause

rb_decompress returned immediately after decoding the first data frame instead of looping back to process the remaining input. (The skippable-frame branch already advanced off and continued the loop; the data-frame branch did not.)

Fix

  • decode_one_frame now reports how many input bytes it consumed (the final ZSTD_inBuffer.pos) via a new size_t* consumed out-parameter, so the caller can advance past the frame. The existing decompress_buffered caller is updated to pass NULL.
  • rb_decompress scans the whole input, accumulating every frame's output:
    • A single DCtx is created once and reused across frames (decode_one_frame already resets the session via ZSTD_reset_session_only), and freed after the loop.
    • The first frame's output becomes the accumulator and subsequent frames are appended with rb_str_cat, so the common single-frame case stays zero-copy.
    • off is advanced by the consumed byte count and the loop continues; a consumed == 0 guard prevents an infinite loop.
    • Existing skippable-frame handling, magic scanning, the "not a zstd frame" rb_raise, and RB_GC_GUARD(input_value) are preserved.

Out of scope: dict:/kwargs handling, streaming, and skippable-frame behavior are unchanged. The public API signature and return type of Zstd.decompress are unchanged.

Verification

bundle exec rake spec passes (17 examples, 0 failures), with no new compiler warnings. New regression specs cover concatenated frames:

  • a + b decompresses to "Hello, World!"
  • a three-frame input (a + b + c) is fully concatenated

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zstd.decompress only decodes the first frame of concatenated frames

1 participant