Several performance optimizations for the scanner's hot path#4238
Several performance optimizations for the scanner's hot path#4238mds-ant wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR optimizes the Go scanner’s hot paths by adding ASCII fast-paths and hoisting loop state into locals to reduce UTF-8 decoding and bounds-check overhead during trivia, comment, and identifier scanning.
Changes:
- Add an ASCII fast path in
charAndSize()to avoidutf8.DecodeRuneInStringfor common bytes. - Speed up trivia, comment, and identifier scanning by using byte-based loops with hoisted
text/pos/end. - Add a fast rejection check for conflict-marker trivia detection.
a932849 to
063a37c
Compare
|
The hand-inlining done in the last three commits or so I find to be unfortunate. Was the underlying output checked? Can we instead reorganize |
|
That's good feedback. Let me see if we can leverage the "outlined" optimization approach here. |
72c6e4d to
8aa415d
Compare
|
So, I couldn't get the inlining approach to work with the existing functions; the approaches I tried exceeded the budget limit of 80. Instead, I introduced a new |
8aa415d to
42d64c4
Compare
isConflictMarkerTrivia is called on every '<', '>', '=', and '|' token to check whether it begins a 7-byte git conflict marker. A marker is by definition the same byte repeated seven times, so if text[pos+1] != text[pos] the answer is false. That covers >99.9% of these tokens. At each of the four Scan() call sites, prefix the call with a charAt(1) == ch conjunct so the non-inlined function call (cost 187) is avoided entirely on the common path. Keep the same fast-reject as the first line of the function body for the remaining callers (JSX, SkipTriviaEx, scanConflictMarkerTrivia). In the body, reorder the line-start disjunction so the cheap text[pos-1] byte check runs before the non-inlined utf8.DecodeLastRuneInString call. Same boolean result, fewer decodes for '==', '||', '<<', '>>' tokens that survive the repeat-byte gate.
charAndSize is called once per byte in several scan loops (comments, identifiers, whitespace runs). The current implementation always calls utf8.DecodeRuneInString, which is non-inlined and constructs a string slice header on every call. Add a leading check for the single-ASCII-byte case (s.pos < s.end and text[s.pos] < utf8.RuneSelf) and return (rune(b), 1) directly. The non-ASCII and EOF paths fall through unchanged, preserving the existing size==0 EOF contract and the containsNonASCII bookkeeping (which only fires for size > 1).
Several hot scan loops (identifier continue bytes, // and /* */ comment bodies, post-newline trivia runs) currently call charAndSize() once per byte, paying for receiver indirection and a bounds check on every iteration even on the ASCII fast path. Introduce scanASCIIWhile(pred func(byte) bool), which hoists text/pos/end into locals and loops while the byte is ASCII and pred(b) holds. The helper body costs 63 against Go's 80 inline budget, so it inlines into each caller; the func-literal predicate then becomes a known direct call and inlines in a second pass. The generated code is the same register-only byte loop a manual hoist would produce, with no indirect calls and no closure allocation. (A generic type-parameter predicate does not work here: GCShape stenciling routes the call through a per-byte dictionary indirection.) Apply at four sites: - scanIdentifier ASCII fast path - // comment body (re-enters charAndSize only on non-ASCII for LS/PS) - /* */ comment body (re-enters only on '*', newline, or non-ASCII) - Scan() newline arm under skipTrivia, swallowing the following indentation run so the outer loop re-enters once per token rather than once per trivia byte
… loops scanTemplateAndSetTokenValue walks template body bytes via s.char() until a stop char; scanNumberFragment walks digit bytes with separator state. Insert scanASCIIWhile at the top of each loop so the common run (plain template text, plain digit run) is consumed in one register-only loop and the existing per-byte handling only runs on stop bytes.
42d64c4 to
ac1b85d
Compare
This PR implements 5 performance optimizations on the scanner's hot path. I've broken each optimization out into its own commit and would recommend a per-commit review:
isConflictMarkerTrivia.charAndSize.scanASCIIWhilehelper to scan runs of ASCII bytes.scanASCIIWhilehelper to template literals and numeric literals.This PR was assisted by Claude Code.
Performance