main-pinned-cowns#29
Merged
Merged
Conversation
18c69af to
ef6bde7
Compare
Main-pinned cowns — a new `PinnedCown` subclass holds its
value as a plain `PyObject *` on the main interpreter, never
round-tripped through XIData. Behaviors whose request set contains
any pinned cown are routed by the scheduler to a single-consumer
main-thread queue and drained by the new `pump` entry point
(or implicitly by `wait`, which auto-pumps when pinned cowns
exist). Designed for objects that cannot survive cross-interpreter
shipping — pyglet shapes, Tk widgets, GPU contexts, open file
handles, ctypes pointers. The companion `examples/boids.py`
rewrite demonstrates the coarse-grained pinned-dispatch pattern:
per-cell physics stays on workers, and one `@when(PinnedCown)`
per frame batches the write-back into main-thread matrices.
Also in this release: `quiesce`, a non-tearing-down
checkpoint primitive.
**New Features**
- **`quiesce(timeout=None, *, stats=False, noticeboard=False)`** —
blocks until every in-flight behavior completes, without tearing
down workers or the noticeboard thread. Implemented via a new
`terminator_seed_inc` peer of `terminator_seed_dec`
(Pyrona-style seed-up / seed-down pairing) so quiescence becomes
a *checkpoint* rather than a shutdown. Useful for parallel-search
patterns that need to inspect a best-so-far cown between rounds
and for tests that must read a worker-produced `send` queue
before its producer interpreter is destroyed. The `stats` and
`noticeboard` flags mirror `wait`: returns `None` by
default, a per-worker stats `list[dict]` when `stats=True`,
a noticeboard `dict[str, Any]` when `noticeboard=True`, or a
`WaitResult` when both are set. Raises `TimeoutError`
if quiescence is not reached within `timeout`. Exported from
`bocpy.__all__`.
- **`PinnedCown(Cown[T])`** — a cown whose value lives
permanently on the main interpreter. Constructible only from the
main interpreter (raises `RuntimeError` from workers);
the value is never picklable, never reified twice, and never
reconstructed in a worker. The capsule *handle* remains a
first-class cross-interpreter shareable — workers may hold it,
embed it in a regular `Cown` value graph, and place it in
noticeboard entries, but only the main thread may acquire the
value. See the new `pinned_cowns` page for the full
contract and the coarse-grained-dispatch pattern.
- **`pump(deadline_ms=None, max_behaviors=None, raise_on_error=False)`**
— drains the main-thread queue of behaviors whose request sets
contain a `PinnedCown`. Call from your event loop's
idle / on-tick hook (pyglet `schedule_interval`, Tk `after`,
asyncio task, …); script-mode programs need not call it
explicitly because `wait` pumps internally. Non-preemptive:
`deadline_ms` gates *starting* the next behavior, not
interrupting one already running. Body exceptions default to
landing on the result cown's `.exception`;
`raise_on_error=True` re-raises the first body exception after
drain. Returns a new `PumpResult` `NamedTuple`
(`executed`, `deadline_reached`, `raised`).
- **`set_pump_watchdog(warn_ms=1000, raise_ms=None, on_starve=None)`**
— configure the pinned-queue starvation watchdog. Both thresholds
gate on **queue-non-empty time**, not raw last-pump time, so
programs running only unpinned work never trip them. Default is
warn-only; users opt into fail-fast via an explicit `raise_ms`
so interactive debugger sessions are not wedged by a breakpoint.
- **`set_wait_pump_poll(ms=50)`** — set the poll cadence for
`wait`'s auto-pump loop. Re-read every iteration so a
concurrent call updates the active wait immediately.
- **`bocpy.PumpResult`** — three-field `NamedTuple` returned by
`pump`. `executed` counts pinned behaviors whose lifecycle
completed (including acquire-failure paths whose MCS chain still
drained). `deadline_reached` is `True` only when the
`deadline_ms` budget tripped before the queue drained.
`raised` counts only body exceptions captured to a result cown
(cleanup-path failures use `PyErr_WriteUnraisable` and do not
count). Exported from `bocpy.__all__`.
- **Coarse-grained pinned-dispatch `examples/boids.py`** — the
per-cell `send("update")` / main-thread `receive("update")`
barrier is replaced by per-cell physics on workers plus one
pinned `@when` per frame that captures every per-cell result
cown together with the two main-thread `PinnedCown` matrices
and performs the batched write-back. Same visual output, fully
worker-parallel per-cell work, single main-thread touchpoint.
**Public C ABI**
- **`bocpy_main_interpid()`** — new `static inline` helper in
`<bocpy/bocpy.h>` returning `PyInterpreterState_GetID(
PyInterpreterState_Main())` pre-typed as `int_least64_t` to
match `bocpy_interpid` for owner-field equality checks.
Safe to call from a worker sub-interpreter for diagnostic /
assert use. Additive — existing consumers recompile unchanged;
`BOCPY_ABI` is unchanged at 1. The
`templates/c_abi_consumer` `bocpy~=` pin moves to
`~=0.9` to signal the new ABI surface it was authored against.
**Improvements**
- **`@when` loop-variable snapshot via default arg** — the
transpiler now accepts `def b(c, i=i)` as an explicit
loop-snapshot idiom in addition to the existing implicit form
(just reference the loop variable in the body). Trailing
positional parameters beyond the cown count are also
auto-captured by name (`def b(c, factor)` captures
`factor`).
- **`@when` alias decorators** — the transpiler now recognises
`from bocpy import when as boc_when` and `import bocpy [as
alias]` followed by `@bocpy.when(...)` or
`@alias.when(...)`, provided the aliasing import is at module
level. Previously only the bare `@when` form was detected.
- **`Behaviors.start()` compiles the export module on main** —
the transpiler's rewritten module is now also instantiated as an
in-memory `types.ModuleType` on the main thread (plus a
`linecache` entry for traceback fidelity) so `pump` can
resolve `__behavior__N` the same way workers do via their
bootstrap.
- **Scheduler-owned behavior pre-header** — `bq_node` and the
new `pinned` OR-fold byte moved out of the opaque
`BOCBehavior` into a scheduler-owned `boc_behavior_prehdr_t`
allocated immediately before each behavior (CPython
`_PyGC_Head` style). `boc_sched.c` no longer needs any
knowledge of `BOCBehavior`'s internal layout; layout drift
between the scheduler and its users is impossible by
construction.
- **`terminator_wait_pumpable`** — new entry in
`boc_terminator.{c,h}` lets the auto-pump loop wake on either
count-zero or main-pinned-depth-becoming-non-zero, both wired
through the existing single condition variable. Single-pumper
enforcement on free-threaded builds (`Py_GIL_DISABLED`) lives
alongside via a `MAIN_PUMP_THREAD` CAS that raises
`RuntimeError` if a second thread tries to pump
concurrently, cleared on every exit path including
`BaseException`.
**Bug Fixes**
- **CWE-401: inheriting INCREF leak in `cown_decref_inline`** —
`CownCapsule_reduce` packs an encoded `XIData` payload by
taking an *inheriting* `COWN_INCREF` per embedded
`CownCapsule`, normally balanced when the bytes are
unpickled inside a worker. On the orphan-death path (the
consumer side never deserialised the payload) the matching
`COWN_DECREF`s never fired and every embedded cown leaked.
`cown_decref_inline` now feeds the encoded bytes through
`pickle.loads` and immediately drops the result, which lets
CPython's GC fire the matching `COWN_DECREF`s recursively.
Gated on the `pickled` flag so native `XIData` round-trips
(e.g. `Matrix`) skip the work entirely.
- **Main-pump behavior reference leak** — both
`_core_main_pump_bounded` and `_core_main_pump_drain_all`
popped a `BehaviorCapsule` from `MAIN_PINNED_QUEUE` but
never released the strong reference the capsule held on the
underlying `BOCBehavior`. Each pinned behavior leaked
one reference until the runtime was torn down. The pump
helpers now `BEHAVIOR_DECREF` the behavior immediately after
the worker-equivalent cleanup runs.
- **MSVC `<stdatomic.h>` compatibility** — Microsoft's
`<stdatomic.h>` (used by CPython's headers on Windows) does
not expose the unsigned `atomic_uint_least64_t` or
`atomic_uintptr_t` forms that the pinned-pump bookkeeping
used. `MAIN_PINNED_DEPTH`, `MAIN_PINNED_NONEMPTY_SINCE_NS`,
`LAST_PUMP_NS`, `WATCHDOG_WARN_MS`, `WATCHDOG_LAST_WARN_NS`,
`WATCHDOG_ON_STARVE` and `MAIN_PUMP_THREAD` are now
`atomic_int_least64_t` / `atomic_intptr_t`. Depth never
goes negative; pointer bits round-trip losslessly through the
signed atomic boundary.
- **CPython 3.10/3.11 `PyErr_SetRaisedException` polyfill** —
added to `include/bocpy/xidata.h` alongside the existing
`PyErr_GetRaisedException` polyfill so the public C ABI's
exception-stash pattern compiles on Python versions before
3.12. `BOCPY_ABI` is unchanged.
- **Portable `boc_max_align_t`** — added to `boc_compat.h` as
a union of the most-strictly-aligned fundamental types
(`long long`, `long double`, `void *`, function pointer).
MSVC exposes the C11 `max_align_t` only under `/std:c11`,
which the CPython build does not pass; the
`boc_behavior_prehdr_t` size assertion now uses
`alignof(boc_max_align_t)` so the alignment contract holds on
every supported toolchain.
- **PEP 678 `add_note` 3.10 fallback** — the new
`Behaviors.quiesce` exception-context shim attaches a note
describing the seed-inc / seed-dec balance on failure. CPython
3.10 predates `BaseException.add_note`; the shim now
writes to `BaseException.__notes__` directly when `add_note`
is missing.
- **Transpiler `except ... as X` mis-classification** —
`ExceptHandler` binds `X` on the handler node
itself rather than via `Name` `Store`, so the
transpiler's free-variable walker mis-classified any read of
`X` inside the handler body as a free variable, appended it
as a behavior parameter, and emitted a call site that
referenced an out-of-scope name. Fixed by a new
`visit_ExceptHandler` hook that registers `X` as a local
before recursing into the handler. Regression locked by
`TestCapturedLocals::test_except_as_name_excluded`.
**Documentation**
- New `pinned_cowns` page — concept and when to use,
`PinnedCown` / `pump` / `PumpResult` / `set_pump_watchdog`
/ `set_wait_pump_poll` API, coarse-grained pinned-dispatch
pattern, event-loop integration recipes (pyglet, Tk, asyncio),
the queue-non-empty-time watchdog contract, free-threaded
single-pumper rule, and free-threaded support trajectory.
Linked from the root toctree.
- `api` expanded with the new `PinnedCown` / `pump` /
`PumpResult` / `set_pump_watchdog` / `set_wait_pump_poll`
entries.
- New "Talking to main-thread objects" subsection in the root
`README.md`'s "A taste of BOC" with a 10-line pyglet snippet
illustrating the coarse-grained pattern; the public-API list
picks up the five new symbols.
- `examples/README.md` calls out the rewritten `boids.py` and
the new `examples/benchmark.py --pinned-spinner` flag.
**Tests**
- **`test/test_pinned_pump.py`** — new module covering the
full `PinnedCown` / `pump` matrix: pure-pinned, mixed
request sets, off-main construction rejection, locked
error-string smoke tests, `deadline_ms` / `max_behaviors`
bounding, body exceptions under default and
`raise_on_error=True`, `wait()` auto-pump, shutdown drain
via drop-exceptions, the watchdog warn-only and explicit-raise
paths, the `QUEUE_NONEMPTY_SINCE` regression for unpinned-only
workloads, hypothesis fuzz over mixed request sets,
`PinnedCown`-handle round-trip through closure capture and
through the noticeboard, `Cown(PinnedCown)` interop, and an
acquire-failure fault-injection test that proves
`IN_PUMP_BODY` / `terminator_dec` / `MAIN_PUMP_THREAD`
cleanup runs on every exit path.
- **`test/test_transpiler.py`** — 192 new lines covering the
`def b(c, i=i)` loop-snapshot form, `@when` alias decorators,
and the `except ... as X` regression.
- **`test_main_pump_drain_all_marks_result_cowns` flaky-shutdown
rewrite** — the original version scheduled eight pinned
behaviors, called `wait(timeout=0)` to force shutdown, then
asserted on the result cowns. The `timeout=0` propagated
through every stage of `Behaviors.stop` (quiescence,
noticeboard drain) and raised `TimeoutError` from one of
them under load before the post-`wait` assertions could run.
The rewritten test calls `_core.main_pump_drain_all` directly
to exercise the shutdown drain in isolation and asserts every
drained result cown carries the shutdown `RuntimeError`.
**Internal**
- **`examples/benchmark.py --pinned-spinner`** — high-rate
pinned-dispatch overlay that adds one tail-recursing
`@when(PinnedCown)` driven by `pump(max_behaviors=1)` on the
main thread at a configurable rate while the existing chain-ring
workload runs on workers. Used during development to verify
worker-throughput regression under high-rate pinned dispatch;
on CPython 3.14 at 4 workers / 10 s / 3 repeats the measured
delta with the spinner active was −0.38%.
- **Noticeboard read contract tightened** — `noticeboard`
now explicitly documents that calling `noticeboard` or
`notice_read` from the main thread *outside* a behavior is
undefined behavior; the supported main-thread read path is
`wait(noticeboard=True)`. Seeding the noticeboard with
`notice_write` from the main thread before scheduling any
behavior remains supported.
- **`test_matrix.TestVectorMethodsInCown` migrated to the
`send("assert", ...)` pattern** — the in-cown `Matrix` vector
tests previously asserted on `result.value` directly from the
test thread, which violates the cown ownership contract. They now
ship assertions out of each behavior via `send("assert", ...)`
and collect on the test thread via a `receive_asserts(count)`
helper, matching the project's BOC testing convention.
- **CI: ASAN `detect_leaks=1`** — the pinned-pump leak hunt
cleared the last masking leak; the ASAN job in
`.github/workflows/pr_gate.yml` now sets
`ASAN_OPTIONS=detect_leaks=1:halt_on_error=1` so any new
reachable leak fails the build at the source instead of
silently accumulating under `detect_leaks=0`.
Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>
ef6bde7 to
63b3881
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Main-pinned cowns — a new
PinnedCownsubclass holds its value as a plainPyObject *on the main interpreter, never round-tripped through XIData. Behaviors whose request set contains any pinned cown are routed by the scheduler to a single-consumer main-thread queue and drained by the newpumpentry point (or implicitly bywait, which auto-pumps when pinned cowns exist). Designed for objects that cannot survive cross-interpreter shipping — pyglet shapes, Tk widgets, GPU contexts, open file handles, ctypes pointers. The companionexamples/boids.pyrewrite demonstrates the coarse-grained pinned-dispatch pattern: per-cell physics stays on workers, and one@when(PinnedCown)per frame batches the write-back into main-thread matrices.New Features
PinnedCown(Cown[T])— a cown whose value lives permanently on the main interpreter. Constructible only from the main interpreter (raisesRuntimeErrorfrom workers); the value is never picklable, never reified twice, and never reconstructed in a worker. The capsule handle remains a first-class cross-interpreter shareable — workers may hold it, embed it in a regularCownvalue graph, and place it in noticeboard entries, but only the main thread may acquire the value. See the newpinned_cownspage for the full contract and the coarse-grained-dispatch pattern.pump(deadline_ms=None, max_behaviors=None, raise_on_error=False)— drains the main-thread queue of behaviors whose request sets contain aPinnedCown. Call from your event loop's idle / on-tick hook (pygletschedule_interval, Tkafter, asyncio task, …); script-mode programs need not call it explicitly becausewaitpumps internally. Non-preemptive:deadline_msgates starting the next behavior, not interrupting one already running. Body exceptions default to landing on the result cown's.exception;raise_on_error=Truere-raises the first body exception after drain. Returns a newPumpResultNamedTuple(executed,deadline_reached,raised).set_pump_watchdog(warn_ms=1000, raise_ms=None, on_starve=None)— configure the pinned-queue starvation watchdog. Both thresholds gate on queue-non-empty time, not raw last-pump time, so programs running only unpinned work never trip them. Default is warn-only; users opt into fail-fast via an explicitraise_msso interactive debugger sessions are not wedged by a breakpoint.set_wait_pump_poll(ms=50)— set the poll cadence forwait's auto-pump loop. Re-read every iteration so a concurrent call updates the active wait immediately.bocpy.PumpResult— three-fieldNamedTuplereturned bypump.executedcounts pinned behaviors whose lifecycle completed (including acquire-failure paths whose MCS chain still drained).deadline_reachedisTrueonly when thedeadline_msbudget tripped before the queue drained.raisedcounts only body exceptions captured to a result cown (cleanup-path failures usePyErr_WriteUnraisableand do not count). Exported frombocpy.__all__.examples/boids.py— the per-cellsend("update")/ main-threadreceive("update")barrier is replaced by per-cell physics on workers plus one pinned@whenper frame that captures every per-cell result cown together with the two main-threadPinnedCownmatrices and performs the batched write-back. Same visual output, fully worker-parallel per-cell work, single main-thread touchpoint.Public C ABI
bocpy_main_interpid()— newstatic inlinehelper in<bocpy/bocpy.h>returningPyInterpreterState_GetID( PyInterpreterState_Main())pre-typed asint_least64_tto matchbocpy_interpidfor owner-field equality checks. Safe to call from a worker sub-interpreter for diagnostic / assert use. Additive — existing consumers recompile unchanged;BOCPY_ABIis unchanged at 1. Thetemplates/c_abi_consumerbocpy~=pin moves to~=0.9to signal the new ABI surface it was authored against.Improvements
@whenloop-variable snapshot via default arg — the transpiler now acceptsdef b(c, i=i)as an explicit loop-snapshot idiom in addition to the existing implicit form (just reference the loop variable in the body). Trailing positional parameters beyond the cown count are also auto-captured by name (def b(c, factor)capturesfactor).@whenalias decorators — the transpiler now recognisesfrom bocpy import when as boc_whenandimport bocpy [as alias]followed by@bocpy.when(...)or@alias.when(...), provided the aliasing import is at module level. Previously only the bare@whenform was detected.Behaviors.start()compiles the export module on main — the transpiler's rewritten module is now also instantiated as an in-memorytypes.ModuleTypeon the main thread (plus alinecacheentry for traceback fidelity) sopumpcan resolve__behavior__Nthe same way workers do via their bootstrap.bq_nodeand the newpinnedOR-fold byte moved out of the opaqueBOCBehaviorinto a scheduler-ownedboc_behavior_prehdr_tallocated immediately before each behavior (CPython_PyGC_Headstyle).boc_sched.cno longer needs any knowledge ofBOCBehavior's internal layout; layout drift between the scheduler and its users is impossible by construction.terminator_wait_pumpable— new entry inboc_terminator.{c,h}lets the auto-pump loop wake on either count-zero or main-pinned-depth-becoming-non-zero, both wired through the existing single condition variable. Single-pumper enforcement on free-threaded builds (Py_GIL_DISABLED) lives alongside via aMAIN_PUMP_THREADCAS that raisesRuntimeErrorif a second thread tries to pump concurrently, cleared on every exit path includingBaseException.Bug Fixes
except ... as Xmis-classification —~ast.ExceptHandlerbindsXon the handler node itself rather than via~ast.NameStore, so the transpiler's free-variable walker mis-classified any read ofXinside the handler body as a free variable, appended it as a behavior parameter, and emitted a call site that referenced an out-of-scope name. Fixed by a newvisit_ExceptHandlerhook that registersXas a local before recursing into the handler. Regression locked byTestCapturedLocals::test_except_as_name_excluded.Documentation
pinned_cownspage — concept and when to use,PinnedCown/pump/PumpResult/set_pump_watchdog/set_wait_pump_pollAPI, coarse-grained pinned-dispatch pattern, event-loop integration recipes (pyglet, Tk, asyncio), the queue-non-empty-time watchdog contract, free-threaded single-pumper rule, and free-threaded support trajectory. Linked from the root toctree.apiexpanded with the newPinnedCown/pump/PumpResult/set_pump_watchdog/set_wait_pump_pollentries.README.md's "A taste of BOC" with a 10-line pyglet snippet illustrating the coarse-grained pattern; the public-API list picks up the five new symbols.examples/README.mdcalls out the rewrittenboids.pyand the newexamples/benchmark.py --pinned-spinnerflag.Tests
test/test_pinned_pump.py— new module covering the fullPinnedCown/pumpmatrix: pure-pinned, mixed request sets, off-main construction rejection, locked error-string smoke tests,deadline_ms/max_behaviorsbounding, body exceptions under default andraise_on_error=True,wait()auto-pump, shutdown drain via drop-exceptions, the watchdog warn-only and explicit-raise paths, theQUEUE_NONEMPTY_SINCEregression for unpinned-only workloads, hypothesis fuzz over mixed request sets,PinnedCown-handle round-trip through closure capture and through the noticeboard,Cown(PinnedCown)interop, and an acquire-failure fault-injection test that provesIN_PUMP_BODY/terminator_dec/MAIN_PUMP_THREADcleanup runs on every exit path.test/test_transpiler.py— 192 new lines covering thedef b(c, i=i)loop-snapshot form,@whenalias decorators, and theexcept ... as Xregression.Internal
examples/benchmark.py --pinned-spinner— high-rate pinned-dispatch overlay that adds one tail-recursing@when(PinnedCown)driven bypump(max_behaviors=1)on the main thread at a configurable rate while the existing chain-ring workload runs on workers. Used during development to verify worker-throughput regression under high-rate pinned dispatch; on CPython 3.14 at 4 workers / 10 s / 3 repeats the measured delta with the spinner active was −0.38%.noticeboardnow explicitly documents that callingnoticeboardornotice_readfrom the main thread outside a behavior is undefined behavior; the supported main-thread read path iswait(noticeboard=True). Seeding the noticeboard withnotice_writefrom the main thread before scheduling any behavior remains supported.test_matrix.TestVectorMethodsInCownmigrated to thesend("assert", ...)pattern — the in-cownMatrixvector tests previously asserted onresult.valuedirectly from the test thread, which violates the cown ownership contract. They now ship assertions out of each behavior viasend("assert", ...)and collect on the test thread via areceive_asserts(count)helper, matching the project's BOC testing convention.Closes #20