Skip to content

Several performance optimizations for the scanner's hot path#4238

Open
mds-ant wants to merge 4 commits into
microsoft:mainfrom
mds-ant:perf/scanner-conflict-reject
Open

Several performance optimizations for the scanner's hot path#4238
mds-ant wants to merge 4 commits into
microsoft:mainfrom
mds-ant:perf/scanner-conflict-reject

Conversation

@mds-ant

@mds-ant mds-ant commented Jun 8, 2026

Copy link
Copy Markdown

This PR implements 5 performance optimizations on the scanner's hot path. I've broken each optimization out into its own commit and would recommend a per-commit review:

  1. Adds a one-line fast reject for non-repeated bytes in isConflictMarkerTrivia.
  2. Adds a fast path for single ASCII bytes to charAndSize.
  3. Adds a scanASCIIWhile helper to scan runs of ASCII bytes.
  4. Applies the scanASCIIWhile helper to template literals and numeric literals.

This PR was assisted by Claude Code.

Performance

                                                      │   old.txt   │              new.txt                │
                                                      │   sec/op    │   sec/op     vs base                │
Parse/empty.ts-64                                       862.9n ± 1%   866.8n ± 1%        ~ (p=0.080 n=25)
Parse/checker.ts-64                                     41.98m ± 1%   39.75m ± 1%   -5.32% (p=0.000 n=25)
Parse/dom.generated.d.ts-64                             17.87m ± 1%   14.86m ± 1%  -16.82% (p=0.000 n=25)
Parse/Herebyfile.mjs-64                                 780.2µ ± 0%   749.5µ ± 1%   -3.94% (p=0.000 n=25)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-64   221.5µ ± 0%   196.7µ ± 1%  -11.19% (p=0.000 n=25)
geomean                                                 645.3µ        596.5µ        -7.56%

                                                      │   old.txt    │               new.txt                 │
                                                      │     B/op     │     B/op      vs base                 │
Parse/empty.ts-64                                       1.051Ki ± 0%   1.051Ki ± 0%       ~ (p=1.000 n=25) ¹
Parse/checker.ts-64                                     26.44Mi ± 0%   26.44Mi ± 0%       ~ (p=0.912 n=25)
Parse/dom.generated.d.ts-64                             11.09Mi ± 0%   11.09Mi ± 0%       ~ (p=0.500 n=25)
Parse/Herebyfile.mjs-64                                 473.5Ki ± 0%   473.5Ki ± 0%       ~ (p=0.299 n=25)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-64   175.7Ki ± 0%   175.7Ki ± 0%       ~ (p=0.091 n=25)
geomean                                                 485.2Ki        485.1Ki       -0.00%
¹ all samples are equal

                                                      │   old.txt   │               new.txt                │
                                                      │  allocs/op  │  allocs/op   vs base                 │
Parse/empty.ts-64                                        4.000 ± 0%    4.000 ± 0%       ~ (p=1.000 n=25) ¹
Parse/checker.ts-64                                     12.96k ± 0%   12.96k ± 0%       ~ (p=1.000 n=25)
Parse/dom.generated.d.ts-64                             3.255k ± 0%   3.255k ± 0%       ~ (p=0.205 n=25)
Parse/Herebyfile.mjs-64                                 1.772k ± 0%   1.772k ± 0%       ~ (p=1.000 n=25) ¹
Parse/jsxComplexSignatureHasApplicabilityError.tsx-64    196.0 ± 0%    196.0 ± 0%       ~ (p=1.000 n=25) ¹
geomean                                                  567.0         567.0       +0.00%
¹ all samples are equal

Copilot AI review requested due to automatic review settings June 8, 2026 08:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR optimizes the Go scanner’s hot paths by adding ASCII fast-paths and hoisting loop state into locals to reduce UTF-8 decoding and bounds-check overhead during trivia, comment, and identifier scanning.

Changes:

  • Add an ASCII fast path in charAndSize() to avoid utf8.DecodeRuneInString for common bytes.
  • Speed up trivia, comment, and identifier scanning by using byte-based loops with hoisted text/pos/end.
  • Add a fast rejection check for conflict-marker trivia detection.

Comment thread internal/scanner/scanner.go Outdated
Comment thread internal/scanner/scanner.go Outdated
Comment thread internal/scanner/scanner.go Outdated
Comment thread internal/scanner/scanner.go Outdated
@mds-ant mds-ant force-pushed the perf/scanner-conflict-reject branch from a932849 to 063a37c Compare June 8, 2026 09:02
@RyanCavanaugh RyanCavanaugh added the No linked issue This PR doesn't say what bug it fixes label Jun 8, 2026
@RyanCavanaugh RyanCavanaugh added this to the Possible Improvement milestone Jun 8, 2026
@jakebailey

Copy link
Copy Markdown
Member

The hand-inlining done in the last three commits or so I find to be unfortunate.

Was the underlying output checked? Can we instead reorganize charAndSize and so on in such a way that they are "outlined", where the fast path is inlinable and then they call another func that is not inlined for the slow, non-ascii case? This is a trick we have used many times before.

@mds-ant

mds-ant commented Jun 8, 2026

Copy link
Copy Markdown
Author

That's good feedback. Let me see if we can leverage the "outlined" optimization approach here.

@mds-ant mds-ant force-pushed the perf/scanner-conflict-reject branch 2 times, most recently from 72c6e4d to 8aa415d Compare June 9, 2026 09:03
@mds-ant

mds-ant commented Jun 9, 2026

Copy link
Copy Markdown
Author

So, I couldn't get the inlining approach to work with the existing functions; the approaches I tried exceeded the budget limit of 80.

Instead, I introduced a new scanASCIIWhile(pred func(byte) bool) helper which advances the scanner's position over a run of ASCII bytes. The helper is small enough that the compiler inlines it, then inlines the func literal into the predicate call. Each callsite still compiles to a byte loop with no indirect calls.

@mds-ant mds-ant force-pushed the perf/scanner-conflict-reject branch from 8aa415d to 42d64c4 Compare June 9, 2026 10:26
mds-ant added 4 commits June 9, 2026 10:43
isConflictMarkerTrivia is called on every '<', '>', '=', and '|' token to
check whether it begins a 7-byte git conflict marker. A marker is by
definition the same byte repeated seven times, so if text[pos+1] !=
text[pos] the answer is false. That covers >99.9% of these tokens.

At each of the four Scan() call sites, prefix the call with a
charAt(1) == ch conjunct so the non-inlined function call (cost 187) is
avoided entirely on the common path. Keep the same fast-reject as the
first line of the function body for the remaining callers (JSX,
SkipTriviaEx, scanConflictMarkerTrivia).

In the body, reorder the line-start disjunction so the cheap
text[pos-1] byte check runs before the non-inlined
utf8.DecodeLastRuneInString call. Same boolean result, fewer decodes for
'==', '||', '<<', '>>' tokens that survive the repeat-byte gate.
charAndSize is called once per byte in several scan loops (comments,
identifiers, whitespace runs). The current implementation always calls
utf8.DecodeRuneInString, which is non-inlined and constructs a string
slice header on every call.

Add a leading check for the single-ASCII-byte case (s.pos < s.end and
text[s.pos] < utf8.RuneSelf) and return (rune(b), 1) directly. The
non-ASCII and EOF paths fall through unchanged, preserving the existing
size==0 EOF contract and the containsNonASCII bookkeeping (which only
fires for size > 1).
Several hot scan loops (identifier continue bytes, // and /* */ comment
bodies, post-newline trivia runs) currently call charAndSize() once per
byte, paying for receiver indirection and a bounds check on every
iteration even on the ASCII fast path.

Introduce scanASCIIWhile(pred func(byte) bool), which hoists text/pos/end
into locals and loops while the byte is ASCII and pred(b) holds. The
helper body costs 63 against Go's 80 inline budget, so it inlines into
each caller; the func-literal predicate then becomes a known direct call
and inlines in a second pass. The generated code is the same
register-only byte loop a manual hoist would produce, with no indirect
calls and no closure allocation. (A generic type-parameter predicate does
not work here: GCShape stenciling routes the call through a per-byte
dictionary indirection.)

Apply at four sites:
- scanIdentifier ASCII fast path
- // comment body (re-enters charAndSize only on non-ASCII for LS/PS)
- /* */ comment body (re-enters only on '*', newline, or non-ASCII)
- Scan() newline arm under skipTrivia, swallowing the following
  indentation run so the outer loop re-enters once per token rather than
  once per trivia byte
… loops

scanTemplateAndSetTokenValue walks template body bytes via s.char() until
a stop char; scanNumberFragment walks digit bytes with separator state.
Insert scanASCIIWhile at the top of each loop so the common run (plain
template text, plain digit run) is consumed in one register-only loop and
the existing per-byte handling only runs on stop bytes.
@mds-ant mds-ant force-pushed the perf/scanner-conflict-reject branch from 42d64c4 to ac1b85d Compare June 9, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

No linked issue This PR doesn't say what bug it fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants