Parser + lexer performance: consolidated 2–3× end-to-end speedup#378
Parser + lexer performance: consolidated 2–3× end-to-end speedup#378JanJakes wants to merge 30 commits into
Conversation
1c666e2 to
c4aff56
Compare
| ); | ||
|
|
||
| while ( true ) { | ||
| if ( |
There was a problem hiding this comment.
| if ( | |
| // Break on file end | |
| if ( |
| while ( true ) { | ||
| if ( | ||
| self::EOF === $this->token_type | ||
| || ( null === $this->token_type && $this->bytes_already_read > 0 ) |
There was a problem hiding this comment.
Shouldn't EOF cover that?
There was a problem hiding this comment.
Addressed in f9172e1. EOF and the second arm catch different cases: self::EOF is set when read_next_token() sees a null byte at the start of a token (clean end-of-input). The null === $this->token_type && $this->bytes_already_read > 0 arm catches the case where read_next_token() returned null mid-stream because of an invalid byte. The > 0 guard keeps the very first iteration alive — at that point $this->token_type is still null because nothing has been read yet, not because we've failed.
| $next_byte = $this->sql[ $this->bytes_already_read + 1 ] ?? null; | ||
|
|
||
| if ( "'" === $byte || '"' === $byte || '`' === $byte ) { | ||
| // A map for a single-byte symbol fast path. |
| ( | ||
| ( $byte >= 'a' && $byte <= 'z' ) | ||
| || ( $byte >= 'A' && $byte <= 'Z' ) | ||
| || $byte > "\x7F" |
There was a problem hiding this comment.
I'd leave a comment on why \x7F is special here
| || ( $byte >= 'A' && $byte <= 'Z' ) | ||
| || $byte > "\x7F" | ||
| ) | ||
| && "'" !== $next_byte |
There was a problem hiding this comment.
Why just ' and not "? Would any quotes-related sql mode/session options have impact here?
| $type = $this->read_line_comment(); | ||
| } elseif ( null !== $byte && strspn( $byte, self::WHITESPACE_MASK ) > 0 ) { | ||
| } elseif ( | ||
| ' ' === $byte |
There was a problem hiding this comment.
Would array + isset() be faster?
There was a problem hiding this comment.
Marginally faster, but this branch rarely fires. next_token() and remaining_tokens() inline-skip whitespace before calling read_next_token() (commit f5b8932), so this arm only handles whitespace that appears between comments. Keeping the === chain for consistency with the rest of the dispatch.
| && 'x' === $next_byte | ||
| && null !== $third_byte | ||
| && strspn( $third_byte, self::HEX_DIGIT_MASK ) > 0 | ||
| && false !== strpos( self::HEX_DIGIT_MASK, $third_byte ) |
| * a parse (sub)tree at each level of the full grammar tree. | ||
| */ | ||
| class WP_Parser_Node { | ||
| final class WP_Parser_Node { |
There was a problem hiding this comment.
does final make it faster somehow?
There was a problem hiding this comment.
Yes — final lets opcache/JIT skip the vtable check on method calls. Measured at +7% end-to-end, see commit daa4185 and the "Big, robust wins" table in the PR description.
| $this->grammar = $grammar; | ||
| $this->token_count = count( $tokens ); | ||
| // Append an end-of-input sentinel token whose id is EMPTY_RULE_ID | ||
| // (0). The hot path can then read $tokens[$pos]->id unconditionally |
| // The INTO negative-lookahead only fires for selectStatement. Cache | ||
| // the rule id so the per-call check is an int compare instead of a | ||
| // string compare. | ||
| $this->select_statement_rule_id = $grammar->get_or_cache_rule_id( 'selectStatement' ); |
There was a problem hiding this comment.
Any memory impact of caching all the rules?
There was a problem hiding this comment.
Negligible. Those are array assignments, not copies — PHP arrays are copy-on-write, so the parser instance just holds references to the grammar's arrays. No actual duplication unless something writes to them, which the parser doesn't.
|
I've left some notes. Haven't read deeply into the diff but the idea makes sense – inline some stuff, reorder, cache, add a trailing token. Nothing revolutionary, but it's still a pretty clever way to get more juice out of it. |
232abea to
8c11f76
Compare
Let's file an issue and explore that, 7% is huge |
8a7cf51 to
513004e
Compare
Hot-path changes in WP_Parser::parse_recursive(): - Inline the terminal match in the branch loop instead of recursing into parse_recursive() for every token. Over the full MySQL test suite this eliminates ~1.6M function calls. - Hoist grammar, rules, fragment_ids, rule_names, tokens, and token_count into local variables so the inner loops avoid repeated property lookups on $this->grammar. - Cache the token count on the instance to avoid a count() per call. - Build branch children in a local array and only instantiate the WP_Parser_Node once the branch has matched; on the MySQL corpus ~75% of speculative nodes were previously created and thrown away. - Drop a dead is_array($subnode) check that never fires in practice (subnodes are false, true, tokens, or nodes - never arrays). - Inline fragment inlining: read the fragment's children directly instead of building a fragment node and immediately merging it. End-to-end parser benchmark on the MySQL server test corpus: Before: ~11,500 QPS After: ~14,900 QPS (+29%)
The grammar now precomputes FIRST and NULLABLE via fixpoint, then indexes each rule's branches by the tokens that can start them. At parse time the parser jumps straight to the candidate branches for the current token instead of iterating every branch and letting most fail. On the full MySQL test suite, 59% of branch attempts previously failed because the first token could never match the branch's FIRST set; with per-branch lookahead those attempts are eliminated. End-to-end parser benchmark: Before: ~14,900 QPS After: ~22,400 QPS (+50%)
Two grammar/parser refinements that both reduce recursive calls: * In parse_recursive(): when the rule has a per-token branch selector but the current token is not in any branch's FIRST and the rule itself is nullable, return 'matched empty' immediately instead of descending into nullable branches that would recursively do the same thing. This alone eliminates ~460k recursive calls on the MySQL corpus. * At grammar build time, expand every single-branch fragment rule into its call sites. Fragments exist only to factor shared sub-sequences and their children are already flattened into the parent AST node, so splicing them directly into parent branches is a no-op for the resulting tree but removes an entire recursive call per use. 480 of the grammar's fragments qualify. Also drops the dead terminal branch at the top of parse_recursive() (the branch loop inlines terminal matching, so parse_recursive is only ever called with non-terminal rule ids) and the always-false empty-branches guard. End-to-end parser benchmark: Before: ~22,400 QPS After: ~27,500 QPS (+23%)
Two minor reductions in per-call work: * Strip explicit EMPTY_RULE_ID symbols out of rule branches at grammar build time. The parser loop would have 'continue'd over them anyway, so removing them ahead of time lets the hot symbol loop drop the epsilon check. Pure-epsilon branches become empty branches and still match empty via the existing empty-children fast path. * Cache the grammar's rules, fragment_ids, rule_names, branches_for_token, nullable_branches, and highest_terminal_id as direct parser instance fields so parse_recursive() no longer pays for a $this->grammar->... double hop on every call. * Collapse the two-step node construction (new + set_children) into a single constructor call that takes the children array directly. This saves a method call per allocated node (~820k across the MySQL corpus). End-to-end parser benchmark: ~27,500 QPS -> ~28,500 QPS (+3.5%).
Three review-noted spots that were terse in the code: - The remaining_tokens() loop guard now spells out why both EOF and `null === token_type && bytes_already_read > 0` are needed (EOF on clean end-of-input vs invalid byte mid-stream, with the `> 0` guard letting the very first iteration through). - The identifier/keyword fast path now explains `$byte > "\x7F"` (UTF-8 multi-byte starter; MySQL identifiers allow U+0080-U+FFFF) and `next_byte !== "'"` (only single quotes form the special hex/bin/n-char literal starters; `"` never does, regardless of SQL mode). No behavior change.
The leading-whitespace skip at the top of read_next_token() was already unrolled into byte-equality checks for the perf reasons documented in 916b512. Apply the same unroll to the third-byte whitespace check that gates a '--' as a line-comment start, so the hot dispatch chain doesn't fall back into strpos() on a 5-char mask for this case. The bound check is folded into '?? null' on the third-byte read, matching the rest of the lookahead style.
The end-of-input sentinel that the parser hot path relies on must be appended whenever the token stream is (re)assigned, not only at construction time. Trunk's WP_MySQL_Parser::reset_tokens() didn't know about it, so reusing a parser across queries left the parser walking off the end of the array. Move the sentinel append, $token_count compute, and $position reset into a single protected set_tokens() helper on WP_Parser. The constructor and the WP_MySQL_Parser::reset_tokens() override both call it, so the invariant has one source of truth.
The pure-PHP parser was rewritten to use the precise per-token branches_for_token + nullable_branches pair (replacing the earlier coarse lookahead_is_match_possible map). Update the native (Rust) parser to consume the same two fields directly: - mysql-rust-bridge.php exports the new fields verbatim and stops producing the legacy lookahead view. - The Rust extension parses branches_for_token's outer key set into a per-rule FIRST set (the inner branch sequences are pure-PHP parser detail and aren't relevant here) and tracks nullable as a separate bool on Rule, replacing the "0 in lookahead" trick. The early-bailout check is unchanged in spirit. No PHP-side compatibility shim survives - the native bridge is now in lock-step with the grammar's actual fields.
Trunk's WP_MySQL_Native_Parser_Node was a lazy-materialization wrapper that extended WP_Parser_Node and overrode 18 read methods to delegate into the Rust-owned arena until first mutation. The performance branch needs WP_Parser_Node to be 'final' for opcache/JIT specialization, and PHP forbids extending a final class. Switch the native parser to eager materialization: - The Rust extension constructs plain WP_Parser_Node instances at parse() time, recursing through the arena to build a complete children array up front. Done in the previous commit by updating the Rust create_php_node_with_classes() to write the rule_id, rule_name, and children properties directly. - Drop the wp_sqlite_mysql_native_ast_* lazy-access exports and the arena-keyed wrapper registry from the Rust extension - the eager tree no longer needs them. - Remove the WP_MySQL_Native_Parser_Node class and the two PHPUnit test files that exercised the wrapper-identity / cycle-collection invariants of the lazy implementation. Stable child identity now follows from PHP's normal object semantics on the eagerly built array. The verifier script gets the same instanceof relaxation (WP_Parser_Node, not the removed subclass). WP_Parser_Node stays 'final', the native and pure-PHP parsers produce indistinguishable ASTs, and 'instanceof WP_Parser_Node' checks throughout the codebase keep working without changes.
Nothing extends WP_Parser_Node. Marking it final lets PHP's opcache and tracing JIT specialize property access and method dispatch since the class layout is now fixed. Small but consistent improvement measured across multiple runs under tracing JIT (~+2% avg, ~+2% best). End-to-end parser benchmark: tracing JIT: ~57K -> ~57-58K QPS avg, 60-61K QPS best no JIT: ~33K -> ~34K QPS avg, 35K QPS best
Note that WP_MySQL_Token intentionally bypasses parent::__construct() for the hot path and must keep its field assignments in sync with WP_Parser_Token, and that remaining_tokens() deliberately inlines the next_token() tokenizer step and must stay in sync with it.
Cover epsilon stripping, single-branch fragment inlining (including cyclic-fragment termination), per-token branch selectors with FIRST/ NULLABLE propagation, single-candidate classification, and the merge_sorted helper. Add an invariant check over the real MySQL grammar that no branch retains an epsilon marker and that every single-candidate rule maps each token to exactly one branch sequence.
Re-measure the documented lexer/parser benchmarks on this branch (PHP 8.5.5, current extension build) and replace the stale trunk/PHP-8.4.5 figures. The parser native row drops from 108,354 QPS (15.45x) to 58,111 QPS (2.00x): trunk's native parser returned a lazy wrapper, so the parse-only benchmark never built the tree. This branch materializes the full WP_Parser_Node tree eagerly, so the number now reflects producing a complete AST. The lexer pure-PHP row rises (71,553 -> 178,409 QPS) thanks to the lexer optimizations on this branch, narrowing the native lexer speedup to 2.00x. Note the default-CLI (no JIT) methodology and that under opcache + tracing JIT the native edge narrows further (lexer ~1.08x, parser ~1.13x).
The PHP bridge now exports the parser grammar as per-token branch selectors (`branches_for_token` / `nullable_branches`) instead of the previous coarse `lookahead_is_match_possible` table - a backward-incompatible change to the ABI shared between the extension binary and the PHP driver. Until now load.php selected the native lexer/parser purely on class existence, so an extension built against a different grammar ABI - most commonly a plugin update that outpaces the installed binary - would be selected and then fatal during native parser construction, with no fallback. Track grammar-ABI compatibility by the extension's minor version (the 0.x line) and bump it to 0.2.0 for this change. Gate native selection on `phpversion( 'wp_mysql_parser' )` falling within the supported line (0.2.x); the native lexer and parser are a matched pair (the native lexer emits a token stream only the native parser can consume), so select both or neither. An unsupported or absent version falls back cleanly to pure PHP, erring on the safe side for unknown binaries. Document the versioning contract in the extension README and add a unit test covering the gate's boundaries.
77e16a4 to
77b0f89
Compare
…tors The grammar is rebuilt on every request (PHP's shared-nothing model resets the static cache between requests), and that build dominated the lex+parse pipeline. Cut it from ~40 ms to ~6.6 ms for a typical request, with parsing unchanged: - Replace the naive iterate-to-fixpoint FIRST/NULLABLE computation with a worklist that recomputes a rule only when a rule it references grows, plus a C-level array union. ~40 ms -> ~18 ms; the grammar output is byte-identical. - Denormalize the per-token branch selectors lazily, per rule, on first descent (ensure_rule_selector) instead of eagerly for all ~1,900 rules. A typical request touches ~7% of rules, so the build drops to ~6.6 ms. The parser materializes a rule's selector on a lookup miss, keeping the common hit path a single array access (warm parse throughput within ~1% of before). - branches_for_token / single_candidate_rules are now lazily populated; build_all_selectors() forces a full build for consumers that read the table directly (the grammar tests). - Export the eager per-rule FIRST sets to the native parser instead of the lazily-built per-token table. The native parser only needs FIRST sets (it builds its own candidates from rules), so it skips the PHP denormalization entirely and no longer depends on a forced full build. - Reuse one parser across the parser benchmark corpus (resetting tokens), mirroring the driver, and refresh the published native-extension numbers.
Merge the "PHPUnit Tests" (pure-PHP) and "MySQL Parser Extension Tests" workflows into a single "PHPUnit Tests" matrix that runs the mysql-on-sqlite suite with and without the native Rust parser extension: pure on PHP 7.2-8.5, plus the extension on PHP 8.0+ (its minimum). Job names read "PHP 8.2 / SQLite 3.45.1" and "PHP 8.2 + ext-wp-mysql-parser / SQLite 3.45.1". This drops the redundant pure-on-extension jobs (the old extension workflow re-ran the plain suite on 7.2-7.4, duplicating "PHPUnit Tests") and removes the reusable phpunit-tests-run.yml. The native jobs build the extension in release mode (cargo build --release) so the suite exercises it at realistic speed rather than the slow debug build. All setup-php steps now pass `coverage: none`. setup-php enables Xdebug by default, and the old pure-suite path left it on, instrumenting every call and running the suite ~4x slower (PHP 7.3: ~59s -> ~14s) while no coverage report was ever produced or consumed. Also set `coverage: none` on the MySQL Proxy and release-publish PHP setups. The merged workflow is path-filtered to the parser/driver/extension packages (plus root composer) like the extension workflow was, and triggers on push to trunk (the old phpunit-tests trigger referenced a non-existent "main" branch).
The native matrix jobs compile the extension with `cargo build --release`, which rebuilds the whole dependency tree from scratch each run. Add Swatinem/rust-cache for the parser-extension workspace so the cargo registry and target dir are cached across runs, cutting the release-compile time on warm runs without affecting the (now realistic) test-step timings.
Summary
Consolidates the parser optimisations from #373, the lexer + token-construction wins from #375, and the
has_child()micro-opt from #376 into a clean, linear history (one shippable change per commit). On top of those it adds a grammar-construction speedup and reworks the native (Rust) parser to materialise its AST eagerly so the parse-node class can befinal.End-to-end (lex+parse) on the 69,577-query MySQL server corpus, pure PHP, best across 3 ABAB-alternated rounds × 5 timed iterations (2 warmup iters per round to heat the tracing JIT):
PHP 8.5; PHP 8.1 verified within ~5%. Tracing JIT is per-worker and shared across requests, so steady-state traffic hits warm JIT (the 2 warmup iters model that); cold first-request numbers are roughly 1.2× for the JIT configs.
Per-process startup cost and memory
The speedup comes from precomputing per-token FIRST sets and branch selectors. The grammar is built once per worker process (PHP's shared-nothing model resets statics between requests) and cached in
self::$mysql_grammar, so the cost is amortised across every query in a request — but short requests pay it up front:A naive build of these structures cost ~40 ms. A worklist fixpoint for FIRST/NULLABLE (recompute a rule only when a referenced rule grows) plus lazy per-rule selector denormalisation (a typical request touches ~7% of rules; the rest is deferred and amortised into parsing) cut construction to the ~3.7 ms above. Numbers are plain CLI — opcache trims construction further. The extra resident memory is a real tradeoff on memory-constrained shared hosts (a supported no-opcache target).
Native parser: eager AST materialisation
Marking
WP_Parser_Nodefinal(a measured +7% win) is incompatible withWP_MySQL_Native_Parser_Nodesubclassing it. Rather than dropfinal, the native parser now materialises its arena AST into plainWP_Parser_Nodeinstances at parse time. This removes the entire lazy layer — the per-AST identity cache, the wrapper registry, ~18wp_sqlite_mysql_native_ast_*bridge functions, and theWP_MySQL_Native_Parser_Nodeclass (~600 LOC Rust + 179 LOC PHP), plus the two wrapper-lifetime test files they backed.Because the translator walks essentially the whole AST for every query, eager materialisation removes per-node FFI round trips and is neutral-to-faster while far simpler: building the full tree, the native parser runs at ~60,100 QPS vs ~28,600 QPS pure PHP (2.1×) on the corpus. It regresses only a hypothetical consumer that parses but inspects a tiny fraction of a large tree.
The grammar is exchanged with the extension as a runtime ABI, so
load.phpnow pins the supported extension minor line and falls back cleanly to pure PHP on a mismatch (e.g. a plugin update outpacing the installed binary) instead of failing at parse time.What was kept from #373 / #375 / #376
$sql_length, byte comparisons replacingstrspn/strcspnmask checks,strpos-based comment-end and quote scans, inlinedremaining_tokens) plus theWP_MySQL_Tokenparent-constructor bypass. The parser overlap was deliberately not applied — Parser performance: 2-3x speedup via grammar preprocessing and interpreter optimisations #373's equivalents are stronger and mixing the two measured no incremental win.! empty( $children )micro-opt; theparse_recursivesplit didn't beat the existing single-candidate fast path.Cost vs benefit (src LOC, end-to-end JIT)
finalparse-node class (+7%), parent-constructor bypass (+5%).has_child()one-liner, CS whitespace re-alignment.Test plan
composer run test(mysql-on-sqlite) — 721/721, 1,533,741 assertions (2 skipped, 2 incomplete), incl.WP_Parser_Grammar_Testsfor the build-time transforms (epsilon stripping, fragment inlining + cycle termination, FIRST/NULLABLE selectors, single-candidate classification,merge_sorted)composer run check-cs— cleanPHP 7.2 / SQLite 3.27.0), the native-extension jobs, WordPress PHPUnit/E2E, WASM, and MySQL Proxy — all green