-
Notifications
You must be signed in to change notification settings - Fork 205
fix: replace O(n²) string concatenation in readLines with array buffering #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,32 +22,50 @@ export function formatExecutionTimeoutError(error: unknown) { | |
|
|
||
| export async function* readLines(stream: ReadableStream<Uint8Array>) { | ||
| const reader = stream.getReader() | ||
| let buffer = '' | ||
| const decoder = new TextDecoder() | ||
| const pending: string[] = [] | ||
|
|
||
| try { | ||
| while (true) { | ||
| const { done, value } = await reader.read() | ||
|
|
||
| if (value !== undefined) { | ||
| buffer += new TextDecoder().decode(value) | ||
| } | ||
|
|
||
| if (done) { | ||
| if (buffer.length > 0) { | ||
| yield buffer | ||
| const trailing = decoder.decode() | ||
| if (trailing) pending.push(trailing) | ||
| if (pending.length > 0) { | ||
| yield pending.join('') | ||
| } | ||
| break | ||
| } | ||
|
|
||
|
Comment on lines
+36
to
40
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 The new code never flushes the TextDecoder before breaking on stream end, silently dropping any incomplete multi-byte UTF-8 sequences (e.g. a 4-byte emoji split across the final chunk). Fix: add Extended reasoning...What the bug is: The new code creates a single Code path that triggers it: Lines 34–38 of the new if (done) {
if (pending.length > 0) {
yield pending.join('')
}
break // ← decoder's internal buffer is never flushed
}After the last chunk is processed with Why existing code doesn't prevent it: The Impact: Any UTF-8 content whose final 1–3 bytes of a multi-byte code point arrive in the last stream chunk will be silently dropped. This is a regression from the old code: the old implementation created a new Step-by-step proof:
How to fix it: if (done) {
const trailing = decoder.decode() // flush internal buffer
if (trailing) pending.push(trailing)
if (pending.length > 0) {
yield pending.join('')
}
break
}This ensures any bytes held inside the TextDecoder are emitted before the generator exits. Note also that the current code pushes an empty string |
||
| let newlineIdx = -1 | ||
| if (value !== undefined) { | ||
| const chunk = decoder.decode(value, { stream: true }) | ||
|
|
||
| do { | ||
| newlineIdx = buffer.indexOf('\n') | ||
| if (newlineIdx !== -1) { | ||
| yield buffer.slice(0, newlineIdx) | ||
| buffer = buffer.slice(newlineIdx + 1) | ||
| if (chunk.indexOf('\n') === -1) { | ||
| // No newline — accumulate in O(1) | ||
| pending.push(chunk) | ||
| continue | ||
| } | ||
| } while (newlineIdx !== -1) | ||
|
|
||
| // Chunk contains newline(s) — split and yield complete lines | ||
| const parts = chunk.split('\n') | ||
|
|
||
| // First part completes the pending line | ||
| pending.push(parts[0]) | ||
| yield pending.join('') | ||
| pending.length = 0 | ||
|
|
||
| // Middle parts are already complete lines | ||
| for (let i = 1; i < parts.length - 1; i++) { | ||
| yield parts[i] | ||
| } | ||
|
|
||
| // Last part starts a new pending line (may be empty) | ||
| const last = parts[parts.length - 1] | ||
| if (last.length > 0) { | ||
| pending.push(last) | ||
| } | ||
| } | ||
| } | ||
| } finally { | ||
| reader.releaseLock() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
doneis reached, the generator yieldspending.join('')and exits without callingdecoder.decode()to flush buffered decoder state. Because this code now usesdecoder.decode(value, { stream: true }), a stream that ends with an incomplete UTF-8 sequence will silently drop trailing bytes instead of emitting the replacement character, which is a data-loss regression from the previous behavior. Add a final flush on EOF and append its result before the last yield.Useful? React with 👍 / 👎.