fix: replace O(n²) string concatenation in readLines with array buffering by beran-t · Pull Request #252 · e2b-dev/code-interpreter

beran-t · 2026-04-14T13:52:58Z

Summary

Fixes #251 — replaces quadratic buffer += chunk string concatenation in readLines() with O(n) array-based buffering.

Problem

readLines() in js/src/utils.ts uses buffer += new TextDecoder().decode(value) followed by buffer.indexOf('\n') on every chunk. The indexOf call forces V8 to flatten its internal cons-string representation, defeating the optimization that normally makes += amortized. For 22 MB of stdout arriving in ~1,400 chunks of ~16 KB:

Total bytes copied: ~15.7 GB of string allocations
Peak heap: 1.5 GB+
Event loop stalls: 10-20 seconds

Fix

Replace with a pending: string[] array buffer:

No-newline chunks: O(1) push() — no string copying at all
Newline chunks: O(line_length) join(), called once per complete line
Worst case (no newlines): single join() at stream end

Metric	Before (`buffer +=`)	After (array + join)
Peak heap	211 MB	20 MB
Peak RSS	329 MB	95 MB
Elapsed time	29.8s	4.1s
Memory amplification	9x	0.9x

(Benchmark numbers from #251)

Test plan

npm run build passes
Existing tests pass (requires code-interpreter template — not available in test account, all failures are template not found)

…ring The readLines() function used `buffer += chunk` followed by `buffer.indexOf('\n')` on every iteration. The indexOf forces V8 to flatten its internal cons-string representation, defeating the optimization that makes += amortized. This results in O(n²) total string copies — for 22 MB of stdout (~1,400 chunks), approximately 15.7 GB of allocations, 1.5 GB+ peak heap, and 10-20s event loop stalls. Replace with an array-based pending[] buffer: - No-newline chunks: O(1) array push, no string copying - Newline chunks: O(line_length) join, once per complete line - Worst case (no newlines): single join() at stream end Fixes #251

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d91e76a1d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T13:56:03Z

js/src/utils.ts

      if (done) {
-        if (buffer.length > 0) {
-          yield buffer
+        if (pending.length > 0) {
+          yield pending.join('')
        }


Flush TextDecoder state before finishing stream

When done is reached, the generator yields pending.join('') and exits without calling decoder.decode() to flush buffered decoder state. Because this code now uses decoder.decode(value, { stream: true }), a stream that ends with an incomplete UTF-8 sequence will silently drop trailing bytes instead of emitting the replacement character, which is a data-loss regression from the previous behavior. Add a final flush on EOF and append its result before the last yield.

Useful? React with 👍 / 👎.

claude · 2026-04-14T14:02:08Z

js/src/utils.ts

+          yield pending.join('')
        }
        break
      }



🔴 The new code never flushes the TextDecoder before breaking on stream end, silently dropping any incomplete multi-byte UTF-8 sequences (e.g. a 4-byte emoji split across the final chunk). Fix: add const trailing = decoder.decode(); if (trailing) pending.push(trailing) before the pending.length check in the done branch.

Extended reasoning...

What the bug is: The new code creates a single TextDecoder with { stream: true } and reuses it across all chunks. When { stream: true } is in effect, the TextDecoder intentionally holds incomplete multi-byte sequences in its internal buffer, emitting them only once the remaining bytes arrive in a later chunk. When the stream ends (done === true), the decoder's internal buffer must be explicitly flushed by calling decoder.decode() (without { stream: true }). The new code never does this.

Code path that triggers it: Lines 34–38 of the new readLines function:

if (done) { if (pending.length > 0) { yield pending.join('') } break // ← decoder's internal buffer is never flushed }

After the last chunk is processed with decoder.decode(value, { stream: true }), any incomplete multi-byte sequence is held internally. When done=true arrives, the code checks pending.length (which only tracks already-decoded strings) and exits without ever draining the decoder.

Why existing code doesn't prevent it: The pending array only receives strings that have already been returned by decoder.decode(..., { stream: true }). Bytes held inside the TextDecoder's internal state are invisible to pending. The pending.length > 0 guard therefore cannot detect this situation.

Impact: Any UTF-8 content whose final 1–3 bytes of a multi-byte code point arrive in the last stream chunk will be silently dropped. This is a regression from the old code: the old implementation created a new TextDecoder() per chunk without { stream: true }, which would at minimum emit a U+FFFD replacement character for incomplete sequences (visible corruption) rather than silent data loss. Affected scenarios include any stream whose last few bytes happen to form an emoji or other multi-byte character — a common occurrence in code output or emoji-laden text.

Step-by-step proof:

Stream ends with a chunk containing bytes [0xF0, 0x9F] — the first two bytes of 🎉 (U+1F389, encoded as F0 9F 8E 89).

decoder.decode(Uint8Array([0xF0, 0x9F]), { stream: true }) returns '' — bytes are held in the decoder's internal buffer.

Since chunk === '', chunk.indexOf('\n') === -1, so pending.push('') is called (pending now has one empty string).

Next iteration: done=true. pending.length === 1 (truthy), so yield pending.join('') yields '' — an empty string is emitted to the caller.

break is reached. The two buffered bytes [0xF0, 0x9F] are silently discarded.

How to fix it:

if (done) { const trailing = decoder.decode() // flush internal buffer if (trailing) pending.push(trailing) if (pending.length > 0) { yield pending.join('') } break }

This ensures any bytes held inside the TextDecoder are emitted before the generator exits. Note also that the current code pushes an empty string '' onto pending when a chunk decodes to '' (bytes fully buffered internally), which causes the done branch to yield a spurious empty string to callers; the flush call also resolves this.

When using decoder.decode(value, { stream: true }), the TextDecoder may hold incomplete multi-byte UTF-8 bytes internally. Without flushing on EOF, those trailing bytes were silently dropped — a regression from the previous behavior which would at minimum emit replacement characters.

beran-t requested review from ValentaTomas, jakubno and mishushakov as code owners April 14, 2026 13:52

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

claude bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace O(n²) string concatenation in readLines with array buffering#252

fix: replace O(n²) string concatenation in readLines with array buffering#252
beran-t wants to merge 2 commits intomainfrom
fix-readlines-quadratic-concat

beran-t commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

claude bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beran-t commented Apr 14, 2026

Summary

Problem

Fix

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant