[mypyc] Add librt.strings.isalnum codepoint primitive#21509
Merged
Conversation
Wraps `Py_UNICODE_ISALNUM` for the codepoint fast path, mirroring the already-merged `librt.strings.isspace` (python#21462) and `isdigit` (python#21504). Microbenchmark, both paths mypyc-compiled, scanning 2.5M codepoints per call: `s[i].isalnum()` runs at ~6.1 ns/codepoint; the codepoint path `c: i32 = i32(ord(s[i])); isalnum(c)` at ~4.8 ns/codepoint, roughly 1.3x faster. The gain is larger inside tokenizer-style loops that mix `isalnum` with literal-i32 compares (no per-character `str` materialization at all).
This comment has been minimized.
This comment has been minimized.
p-sawicki
reviewed
May 19, 2026
| o = ord(c) | ||
| assert isspace(o) == isspace(i) == a.isspace() | ||
| assert isdigit(o) == isdigit(i) == a.isdigit() | ||
| assert isalnum(o) == isalnum(i) == a.isalnum() |
Collaborator
There was a problem hiding this comment.
i think we're missing coverage for calling these functions through the python wrappers because here they are transformed into direct C function calls.
could you add a driver.py in this test and call them with a couple of values? doesn't have to be the entire space like in the compiled file. would be good to also test the exception raised when the codepoint is outside of int32 range.
edit: or instead of driver.py wrap the librt functions with Any variables and call through the wrapper. we have a couple of examples in other tests like this.
The existing run-test for the codepoint classifiers exercises only the compiled fast path: mypyc rewrites `isspace(c)` / `isdigit(c)` / `isalnum(c)` into direct calls to the underlying C symbols, so the PyMethodDef wrappers (`cp_isspace`, `cp_isdigit`, `cp_isalnum`) and their i32 range check never get exercised by the existing test. Iterate the librt functions in a tuple so the callee is opaque to mypyc and dispatch falls back to the generic path, hitting the PyMethodDef wrappers. Also assert the OverflowError raised by the wrappers' `cp_parse_i32` for inputs outside i32 range.
Contributor
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
alicederyn
pushed a commit
to alicederyn/mypy
that referenced
this pull request
May 20, 2026
3rd PR for python#21418, mirroring `librt.strings.isdigit`. Measured on a microbenchmark this is roughly 30-40% faster for a char
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
3rd PR for #21418, mirroring
librt.strings.isdigit.Measured on a microbenchmark this is roughly 30-40% faster for a char