Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 32 additions & 14 deletions js/src/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,32 +22,50 @@ export function formatExecutionTimeoutError(error: unknown) {

export async function* readLines(stream: ReadableStream<Uint8Array>) {
const reader = stream.getReader()
let buffer = ''
const decoder = new TextDecoder()
const pending: string[] = []

try {
while (true) {
const { done, value } = await reader.read()

if (value !== undefined) {
buffer += new TextDecoder().decode(value)
}

if (done) {
if (buffer.length > 0) {
yield buffer
const trailing = decoder.decode()
if (trailing) pending.push(trailing)
if (pending.length > 0) {
yield pending.join('')
}
Comment on lines 32 to 37
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Flush TextDecoder state before finishing stream

When done is reached, the generator yields pending.join('') and exits without calling decoder.decode() to flush buffered decoder state. Because this code now uses decoder.decode(value, { stream: true }), a stream that ends with an incomplete UTF-8 sequence will silently drop trailing bytes instead of emitting the replacement character, which is a data-loss regression from the previous behavior. Add a final flush on EOF and append its result before the last yield.

Useful? React with 👍 / 👎.

break
}

Comment on lines +36 to 40
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new code never flushes the TextDecoder before breaking on stream end, silently dropping any incomplete multi-byte UTF-8 sequences (e.g. a 4-byte emoji split across the final chunk). Fix: add const trailing = decoder.decode(); if (trailing) pending.push(trailing) before the pending.length check in the done branch.

Extended reasoning...

What the bug is: The new code creates a single TextDecoder with { stream: true } and reuses it across all chunks. When { stream: true } is in effect, the TextDecoder intentionally holds incomplete multi-byte sequences in its internal buffer, emitting them only once the remaining bytes arrive in a later chunk. When the stream ends (done === true), the decoder's internal buffer must be explicitly flushed by calling decoder.decode() (without { stream: true }). The new code never does this.

Code path that triggers it: Lines 34–38 of the new readLines function:

if (done) {
  if (pending.length > 0) {
    yield pending.join('')
  }
  break  // ← decoder's internal buffer is never flushed
}

After the last chunk is processed with decoder.decode(value, { stream: true }), any incomplete multi-byte sequence is held internally. When done=true arrives, the code checks pending.length (which only tracks already-decoded strings) and exits without ever draining the decoder.

Why existing code doesn't prevent it: The pending array only receives strings that have already been returned by decoder.decode(..., { stream: true }). Bytes held inside the TextDecoder's internal state are invisible to pending. The pending.length > 0 guard therefore cannot detect this situation.

Impact: Any UTF-8 content whose final 1–3 bytes of a multi-byte code point arrive in the last stream chunk will be silently dropped. This is a regression from the old code: the old implementation created a new TextDecoder() per chunk without { stream: true }, which would at minimum emit a U+FFFD replacement character for incomplete sequences (visible corruption) rather than silent data loss. Affected scenarios include any stream whose last few bytes happen to form an emoji or other multi-byte character — a common occurrence in code output or emoji-laden text.

Step-by-step proof:

  1. Stream ends with a chunk containing bytes [0xF0, 0x9F] — the first two bytes of 🎉 (U+1F389, encoded as F0 9F 8E 89).
  2. decoder.decode(Uint8Array([0xF0, 0x9F]), { stream: true }) returns '' — bytes are held in the decoder's internal buffer.
  3. Since chunk === '', chunk.indexOf('\n') === -1, so pending.push('') is called (pending now has one empty string).
  4. Next iteration: done=true. pending.length === 1 (truthy), so yield pending.join('') yields '' — an empty string is emitted to the caller.
  5. break is reached. The two buffered bytes [0xF0, 0x9F] are silently discarded.

How to fix it:

if (done) {
  const trailing = decoder.decode()  // flush internal buffer
  if (trailing) pending.push(trailing)
  if (pending.length > 0) {
    yield pending.join('')
  }
  break
}

This ensures any bytes held inside the TextDecoder are emitted before the generator exits. Note also that the current code pushes an empty string '' onto pending when a chunk decodes to '' (bytes fully buffered internally), which causes the done branch to yield a spurious empty string to callers; the flush call also resolves this.

let newlineIdx = -1
if (value !== undefined) {
const chunk = decoder.decode(value, { stream: true })

do {
newlineIdx = buffer.indexOf('\n')
if (newlineIdx !== -1) {
yield buffer.slice(0, newlineIdx)
buffer = buffer.slice(newlineIdx + 1)
if (chunk.indexOf('\n') === -1) {
// No newline — accumulate in O(1)
pending.push(chunk)
continue
}
} while (newlineIdx !== -1)

// Chunk contains newline(s) — split and yield complete lines
const parts = chunk.split('\n')

// First part completes the pending line
pending.push(parts[0])
yield pending.join('')
pending.length = 0

// Middle parts are already complete lines
for (let i = 1; i < parts.length - 1; i++) {
yield parts[i]
}

// Last part starts a new pending line (may be empty)
const last = parts[parts.length - 1]
if (last.length > 0) {
pending.push(last)
}
}
}
} finally {
reader.releaseLock()
Expand Down
Loading