Skip to content

src: avoid copying source string in TextEncoder.encode#63897

Open
anonrig wants to merge 1 commit into
nodejs:mainfrom
anonrig:src-textencoder-encode-zero-copy
Open

src: avoid copying source string in TextEncoder.encode#63897
anonrig wants to merge 1 commit into
nodejs:mainfrom
anonrig:src-textencoder-encode-zero-copy

Conversation

@anonrig

@anonrig anonrig commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

EncodeUtf8String, which backs TextEncoder.prototype.encode(), copied the entire source string out of the V8 heap into a MaybeStackBuffer (via WriteOneByteV2/WriteV2) before encoding — and allocated on the heap for strings larger than the stack buffer (> 4096 chars). The sibling EncodeInto already avoids this by reading the flat content directly through v8::String::ValueView.

This reads the flat content via ValueView instead. Because ValueView carries a DisallowGarbageCollection scope, the backing store cannot be allocated while the view is alive, so the view is used in two short scopes:

  1. Size pass — validate the encoding and compute the exact UTF-8 output length.
  2. Allocate the exactly-sized backing store (GC now permitted).
  3. Encode pass — re-acquire the view (flattening is cached on the string, so this is cheap) and convert directly into the backing store.

The rare unpaired-surrogate path still copies into a mutable buffer, since to_well_formed_utf16 runs in place. Output buffer sizing and WHATWG replacement semantics are unchanged.

Benchmarks

benchmark/util/text-encoder.js (op=encode, n=1e6, 12 interleaved runs each via compare.js):

type len=256 len=1024 len=8192
ascii +14.1% +23.5% +43.9%
one-byte (latin1) +14.4% +22.2% +12.3%
two-byte (utf-16) +16.7% +20.4% +15.5%

len=32 uses the unchanged small-string path (within noise). As a control, the untouched encodeInto path stayed flat (−2.0%..+0.5%) across all 12 configurations, confirming the harness is unbiased.

Verification

  • Builds clean (full relink).
  • All encoding / TextEncoder / TextDecoder parallel tests pass.
  • Functional spot-checks across ASCII, Latin1, BMP, valid surrogate pairs, and lone/reversed surrogates match Buffer.from(s, 'utf8').

`EncodeUtf8String`, which backs `TextEncoder.prototype.encode()`, copied
the entire source string out of the V8 heap into a `MaybeStackBuffer`
(via `WriteOneByteV2`/`WriteV2`) before encoding, allocating on the heap
for strings larger than the stack buffer. `EncodeInto` already avoids
this by reading the flat content directly through `v8::String::ValueView`.

Read the flat content via `ValueView` instead. Because `ValueView` holds
a `DisallowGarbageCollection` scope, the backing store cannot be
allocated while it is alive, so the view is used in two short scopes:
one to validate and compute the exact UTF-8 length, and one to encode
directly into the backing store after allocation. Flattening is cached
on the string, so re-acquiring the view is cheap. The rare unpaired
surrogate path still copies into a mutable buffer for in-place
`to_well_formed_utf16`.

benchmark/util/text-encoder.js (op=encode, n=1e6, 12 runs each):

                       len=256  len=1024  len=8192
  ascii                 +14.1%    +23.5%    +43.9%
  one-byte (latin1)     +14.4%    +22.2%    +12.3%
  two-byte (utf-16)     +16.7%    +20.4%    +15.5%

len=32 uses the unchanged small-string path (~noise). The untouched
encodeInto path stayed flat (-2.0%..+0.5%) across all configurations.
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jun 13, 2026
@addaleax addaleax added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 13, 2026
@github-actions github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 13, 2026
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants