Conversation
The lexer previously used its own internal `LexerCtx` abstraction that allowed it to consume the characters that made up a token without changing the lexer state, then update the state at once when committing to consuming the characters. However, manually resetting the lexer to the original position when giving up on parsing a token is simple enough that this abstraction was not holding its weight. Simplify the lexer by removing internal contexts, and move the simplified method bodies to lexer.h. Generally we try to avoid putting lots of code in headers, but in this case making the code available to the inliner, along with removing the extra layer of abstraction, makes the parser about 20% faster.
tlively
added a commit
that referenced
this pull request
Apr 14, 2026
The first parser pass is responsible for two things: finding the locations of definitions of top-level module items like globals and functions and finding the locations of implicit function type definitions. It previously accomplished the latter by fully parsing every instruction in each function. But the IR is not constructed in this phase of parsing, so fully parsing every instruction was largely wasted work. Optimize the parser by parsing only the instructions that might have implicit type definitions and otherwise just blindly match parentheses to skip the function body. Combined with #8597, this speeds up parsing by 30-40%.
kripken
approved these changes
Apr 14, 2026
Member
kripken
left a comment
There was a problem hiding this comment.
Do I want to know how this affects our compile times? 😄
Member
Author
|
No difference! (at least on a very unscientific experiment with N=1) |
tlively
added a commit
that referenced
this pull request
Apr 15, 2026
The first parser pass is responsible for two things: finding the locations of definitions of top-level module items like globals and functions and finding the locations of implicit function type definitions. It previously accomplished the latter by fully parsing every instruction in each function. But the IR is not constructed in this phase of parsing, so fully parsing every instruction was largely wasted work. Optimize the parser by parsing only the instructions that might have implicit type definitions and otherwise just blindly match parentheses to skip the function body. Combined with #8597, this speeds up parsing by 30-40%.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The lexer previously used its own internal
LexerCtxabstraction that allowed it to consume the characters that made up a token without changing the lexer state, then update the state at once when committing to consuming the characters. However, manually resetting the lexer to the original position when giving up on parsing a token is simple enough that this abstraction was not holding its weight. Simplify the lexer by removing internal contexts, and move the simplified method bodies to lexer.h. Generally we try to avoid putting lots of code in headers, but in this case making the code available to the inliner, along with removing the extra layer of abstraction, makes the parser about 20% faster.