gh-144981: Make PyUnstable_Code_SetExtra/GetExtra thread-safe by yoney · Pull Request #144980 · python/cpython

yoney · 2026-02-18T22:50:53Z

PyUnstable_Code_GetExtra and PyUnstable_Code_SetExtra had no thread-safety. Concurrent calls could cause data races.
GetExtra is performance-sensitive for use cases like JIT. It uses a lock-free acquire load of co_extra, and acquire loads of individual slots.
SetExtra uses a per-object critical section. When the slot already exists, it updates in place with a release store. When the buffer needs to grow, the free-threaded build allocates a new buffer, copies existing slots, publishes with a release store, and reclaims the old buffer via _PyMem_FreeDelayed(QSBR). The GIL build simply uses realloc.
PyUnstable_Eval_RequestCodeExtraIndex is now protected by interp->code_state.mutex.
Added free-threading tests for concurrent RequestCodeExtraIndex, SetExtra, and GetExtra.

The new tests exercise the concurrent paths and a free-threaded TSAN build without the fix crashes/emits warnings.

cc: @DinoV @colesbury

Issue: PyUnstable_Code_SetExtra and PyUnstable_Code_GetExtra are not thread-safe in free-threaded builds #144981

Objects/codeobject.c

bedevere-app · 2026-02-18T23:34:30Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

yoney · 2026-02-19T17:23:00Z

I benchmarked both the base and new implementations on GIL builds. Since SetExtra regressed significantly (from ~6 ns to ~104 ns), I refactored the implementation. I am going to update after this comment.

Note: Benchmarks were run on an x86_64 (Skylake) system.

Base (GIL)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       40.9 ns         38.0 ns          250 items_per_second=26.3047M/s
BM_CodeExtra_SetExtra                          5.57 ns         5.56 ns    125068314 items_per_second=180.002M/s
BM_CodeExtra_GetExtra                          3.12 ns         3.12 ns    224233759 items_per_second=320.758M/s
BM_CodeExtra_SetGet                            7.98 ns         7.96 ns     87762271 items_per_second=125.585M/s

New Implementation (GIL)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       38.9 ns         36.4 ns          250 items_per_second=27.5028M/s
BM_CodeExtra_SetExtra                           104 ns          103 ns      5854292 items_per_second=9.66832M/s
BM_CodeExtra_GetExtra                          2.77 ns         2.77 ns    252889160 items_per_second=361.305M/s
BM_CodeExtra_SetGet                             105 ns          105 ns      6025338 items_per_second=9.54896M/s

colesbury · 2026-02-19T17:47:25Z

How are these used in practice? Can we leave PyUnstable_Code_GetExtra()/PyUnstable_Code_SetExtra() on the same index synchronization up to the caller? Then we only would have to avoid races between calls on different indexes.

For example, something like:

PyUnstable_Eval_RequestCodeExtraIndex: stop-the-world on resize
PyUnstable_Code_GetExtra: no synchronization
PyUnstable_Code_SetExtra: no synchronization

Would that work?

colesbury · 2026-02-19T17:48:45Z

Oh, the stop the world on resize would be in PyUnstable_Code_SetExtra, not PyUnstable_Eval_RequestCodeExtraIndex.

yoney · 2026-02-19T17:56:14Z

After the refactoring benchmark:

For GIL builds, use realloc.
For FT builds, allocate a new buffer and swap it in only when the buffer needs to grow.

GetExtra performance looks as good as the baseline.
SetExtra has additional overhead on FT builds, increasing from ~6 ns to ~21 ns (as expected).

Note: Benchmarks were run on an x86_64 (Skylake) system.

Base (GIL)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       40.9 ns         38.0 ns          250 items_per_second=26.3047M/s
BM_CodeExtra_SetExtra                          5.57 ns         5.56 ns    125068314 items_per_second=180.002M/s
BM_CodeExtra_GetExtra                          3.12 ns         3.12 ns    224233759 items_per_second=320.758M/s
BM_CodeExtra_SetGet                            7.98 ns         7.96 ns     87762271 items_per_second=125.585M/s

New Implementation (GIL)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       39.1 ns         35.6 ns          250 items_per_second=28.1215M/s
BM_CodeExtra_SetExtra                          6.08 ns         6.06 ns    114801349 items_per_second=165.076M/s
BM_CodeExtra_GetExtra                          2.78 ns         2.77 ns    252335083 items_per_second=360.554M/s
BM_CodeExtra_SetGet                            8.33 ns         8.31 ns     84310711 items_per_second=120.317M/s

New Implementation (FT)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       56.0 ns         52.5 ns          250 items_per_second=19.0346M/s
BM_CodeExtra_SetExtra                          21.6 ns         21.6 ns     32413782 items_per_second=46.3671M/s
BM_CodeExtra_GetExtra                          2.78 ns         2.78 ns    251634827 items_per_second=360.15M/s
BM_CodeExtra_SetGet                            22.9 ns         22.8 ns     24306594 items_per_second=43.7748M/s

yoney · 2026-02-19T18:24:58Z

How are these used in practice?

For Cinderx, this is used to count the number of calls before making a JIT decision.

We implemented a workaround here: facebookincubator/cinderx@26f3e2b
The counting logic is here: facebookincubator/cinderx@8593c48

yoney · 2026-02-19T18:27:35Z

I have made the requested changes; please review again.

bedevere-app · 2026-02-19T18:27:40Z

Thanks for making the requested changes!

@DinoV: please review the changes made to this pull request.

yoney · 2026-02-19T21:57:55Z

aarch64 benchmark run:

GetExtra performance looks as good as the baseline.
SetExtra has additional overhead on FT builds, increasing from ~3 ns to ~15 ns (as expected).

Base (GIL)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       19.7 ns         16.8 ns          250 items_per_second=59.6374M/s
BM_CodeExtra_SetExtra                          2.36 ns         2.36 ns    304021112 items_per_second=424.059M/s
BM_CodeExtra_GetExtra                          1.54 ns         1.53 ns    456420073 items_per_second=652.039M/s
BM_CodeExtra_SetGet                            3.34 ns         3.33 ns    223818527 items_per_second=300.594M/s

New Implementation (GIL)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       19.2 ns         15.9 ns          250 items_per_second=63.004M/s
BM_CodeExtra_SetExtra                          2.42 ns         2.40 ns    296821748 items_per_second=416.132M/s
BM_CodeExtra_GetExtra                          1.55 ns         1.53 ns    458869555 items_per_second=652.618M/s
BM_CodeExtra_SetGet                            3.38 ns         3.34 ns    220168435 items_per_second=299.219M/s

New Implementation (FT)

---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
BM_CodeExtra_RequestIndex/iterations:250       32.0 ns         30.1 ns          250 items_per_second=33.2447M/s
BM_CodeExtra_SetExtra                          15.8 ns         15.8 ns     44310672 items_per_second=63.4835M/s
BM_CodeExtra_GetExtra                          1.55 ns         1.53 ns    458680585 items_per_second=653.316M/s
BM_CodeExtra_SetGet                            19.8 ns         19.7 ns     35630001 items_per_second=50.8699M/s

DinoV · 2026-02-19T22:56:30Z

Oh, the stop the world on resize would be in PyUnstable_Code_SetExtra, not PyUnstable_Eval_RequestCodeExtraIndex.

I think stop the world on growth would probably be too expensive. If it was on every initial allocation it would certainly be way too expensive. But even if it was on an actual resize it might be really painful if someone added a new index as there'd be a burst of stop the world events as all the code objects got updated (as usually consumers are going to be attaching to a lot of objects).

But I do think we could make the callers perform whatever synchronization they need to protect their own indexes - e.g. maybe they need to construct a (cheap and empty) value to be filled in and tolerate throwing one of those away and then do a get to resolve the race. But I also think the minimal synchronization in the latest version is fine.

DinoV

LGTM!

Make PyUnstable_Code_SetExtra/GetExtra thread-safe

5373343

yoney mentioned this pull request Feb 18, 2026

PyUnstable_Code_SetExtra and PyUnstable_Code_GetExtra are not thread-safe in free-threaded builds #144981

Open

yoney changed the title ~~Make PyUnstable_Code_SetExtra/GetExtra thread-safe~~ gh-144981: Make PyUnstable_Code_SetExtra/GetExtra thread-safe Feb 18, 2026

Add news entry

e2eef9a

yoney marked this pull request as ready for review February 18, 2026 23:22

yoney requested a review from markshannon as a code owner February 18, 2026 23:22

bedevere-app bot added the awaiting review label Feb 18, 2026

DinoV requested changes Feb 18, 2026

View reviewed changes

Objects/codeobject.c Outdated Show resolved Hide resolved

Objects/codeobject.c Outdated Show resolved Hide resolved

Objects/codeobject.c Outdated Show resolved Hide resolved

Objects/codeobject.c Outdated Show resolved Hide resolved

Objects/codeobject.c Outdated Show resolved Hide resolved

bedevere-app bot added awaiting changes and removed awaiting review labels Feb 18, 2026

Address comments and refactor SetExtra implementation

3420e15

bedevere-app bot added awaiting change review and removed awaiting changes labels Feb 19, 2026

bedevere-app bot requested a review from DinoV February 19, 2026 18:27

DinoV approved these changes Feb 19, 2026

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting change review labels Feb 19, 2026

Uh oh!

Conversation

yoney commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bedevere-app bot commented Feb 18, 2026

Uh oh!

yoney commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colesbury commented Feb 19, 2026

Uh oh!

colesbury commented Feb 19, 2026

Uh oh!

yoney commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoney commented Feb 19, 2026

Uh oh!

yoney commented Feb 19, 2026

Uh oh!

bedevere-app bot commented Feb 19, 2026

Uh oh!

yoney commented Feb 19, 2026

Uh oh!

DinoV commented Feb 19, 2026

Uh oh!

DinoV left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

yoney commented Feb 18, 2026 •

edited

Loading

yoney commented Feb 19, 2026 •

edited

Loading

yoney commented Feb 19, 2026 •

edited

Loading