gh-140009: Optimize JSON parsing with object_pairs_hook via PyTuple_FromArray#144772
Open
andrewloux wants to merge 2 commits intopython:mainfrom
Open
gh-140009: Optimize JSON parsing with object_pairs_hook via PyTuple_FromArray#144772andrewloux wants to merge 2 commits intopython:mainfrom
object_pairs_hook via PyTuple_FromArray#144772andrewloux wants to merge 2 commits intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
object_pairs_hook via PyTuple_FromArray
Contributor
|
I could actually see speedups for both on my Apple M1 Pro:
👍 |
Member
|
Can your benchmarks not include the time for constructing the payload please? because this benchmark is not relevant otherwise (noise from constructing the payload for instance) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR optimizes the
object_pairs_hookpath inModules/_json.c(e.g., used byjson.loads(s, object_pairs_hook=list)) by replacingPyTuple_PackwithPyTuple_FromArray.PyTuple_Packprocesses arguments variadically viava_list, whilePyTuple_FromArrayperforms a directmemcpyfrom a stack-allocated array. For a fixed-size-2 tuple constructed on every key-value pair in the hot path, this eliminates unnecessary overhead.Benchmarks (PGO+LTO)
Validated using
pyperfin--rigorousmode on a full production build. Results were reproduced across multiple independent sessions.--enable-optimizations --with-lto--rigorousmode)upstream/mainpytuple-json-objectpairs-fromarray(92d3f1a)json_pairs_hook_densejson_pairs_hook_controlGeometric mean: 1.01x faster
Benchmark script and repro commands
Repro commands (using
bench_json_pairs_hook.py):bench_json_pairs_hook.py:Analysis
The
json_pairs_hook_densebenchmark parses a large JSON array of objects usingobject_pairs_hook=list. In this scenario, every key-value pair requires a size-2 tuple. The switch to the array-based API yields a ~2% speedup on this specific codepath, with a conservative geometric mean of ~1% across both benchmarks.Notably, the candidate also shows a significant reduction in variance (±9 ms → ±3 ms), suggesting more deterministic performance on the optimized path.
The
json_pairs_hook_controlcase confirms that the standard JSON decoding path (using the default decoder) is unaffected.