Skip to content

main-pinned-cowns#29

Merged
matajoh merged 1 commit into
microsoft:mainfrom
matajoh:main-pinned-cowns
Jun 6, 2026
Merged

main-pinned-cowns#29
matajoh merged 1 commit into
microsoft:mainfrom
matajoh:main-pinned-cowns

Conversation

@matajoh
Copy link
Copy Markdown
Member

@matajoh matajoh commented Jun 5, 2026

Main-pinned cowns — a new PinnedCown subclass holds its value as a plain PyObject * on the main interpreter, never round-tripped through XIData. Behaviors whose request set contains any pinned cown are routed by the scheduler to a single-consumer main-thread queue and drained by the new pump entry point (or implicitly by wait, which auto-pumps when pinned cowns exist). Designed for objects that cannot survive cross-interpreter shipping — pyglet shapes, Tk widgets, GPU contexts, open file handles, ctypes pointers. The companion examples/boids.py rewrite demonstrates the coarse-grained pinned-dispatch pattern: per-cell physics stays on workers, and one @when(PinnedCown) per frame batches the write-back into main-thread matrices.

New Features

  • PinnedCown(Cown[T]) — a cown whose value lives permanently on the main interpreter. Constructible only from the main interpreter (raises RuntimeError from workers); the value is never picklable, never reified twice, and never reconstructed in a worker. The capsule handle remains a first-class cross-interpreter shareable — workers may hold it, embed it in a regular Cown value graph, and place it in noticeboard entries, but only the main thread may acquire the value. See the new pinned_cowns page for the full contract and the coarse-grained-dispatch pattern.
  • pump(deadline_ms=None, max_behaviors=None, raise_on_error=False) — drains the main-thread queue of behaviors whose request sets contain a PinnedCown. Call from your event loop's idle / on-tick hook (pyglet schedule_interval, Tk after, asyncio task, …); script-mode programs need not call it explicitly because wait pumps internally. Non-preemptive: deadline_ms gates starting the next behavior, not interrupting one already running. Body exceptions default to landing on the result cown's .exception; raise_on_error=True re-raises the first body exception after drain. Returns a new PumpResult NamedTuple (executed, deadline_reached, raised).
  • set_pump_watchdog(warn_ms=1000, raise_ms=None, on_starve=None) — configure the pinned-queue starvation watchdog. Both thresholds gate on queue-non-empty time, not raw last-pump time, so programs running only unpinned work never trip them. Default is warn-only; users opt into fail-fast via an explicit raise_ms so interactive debugger sessions are not wedged by a breakpoint.
  • set_wait_pump_poll(ms=50) — set the poll cadence for wait's auto-pump loop. Re-read every iteration so a concurrent call updates the active wait immediately.
  • bocpy.PumpResult — three-field NamedTuple returned by pump. executed counts pinned behaviors whose lifecycle completed (including acquire-failure paths whose MCS chain still drained). deadline_reached is True only when the deadline_ms budget tripped before the queue drained. raised counts only body exceptions captured to a result cown (cleanup-path failures use PyErr_WriteUnraisable and do not count). Exported from bocpy.__all__.
  • Coarse-grained pinned-dispatch examples/boids.py — the per-cell send("update") / main-thread receive("update") barrier is replaced by per-cell physics on workers plus one pinned @when per frame that captures every per-cell result cown together with the two main-thread PinnedCown matrices and performs the batched write-back. Same visual output, fully worker-parallel per-cell work, single main-thread touchpoint.

Public C ABI

  • bocpy_main_interpid() — new static inline helper in <bocpy/bocpy.h> returning PyInterpreterState_GetID( PyInterpreterState_Main()) pre-typed as int_least64_t to match bocpy_interpid for owner-field equality checks. Safe to call from a worker sub-interpreter for diagnostic / assert use. Additive — existing consumers recompile unchanged; BOCPY_ABI is unchanged at 1. The templates/c_abi_consumer bocpy~= pin moves to ~=0.9 to signal the new ABI surface it was authored against.

Improvements

  • @when loop-variable snapshot via default arg — the transpiler now accepts def b(c, i=i) as an explicit loop-snapshot idiom in addition to the existing implicit form (just reference the loop variable in the body). Trailing positional parameters beyond the cown count are also auto-captured by name (def b(c, factor) captures factor).
  • @when alias decorators — the transpiler now recognises from bocpy import when as boc_when and import bocpy [as alias] followed by @bocpy.when(...) or @alias.when(...), provided the aliasing import is at module level. Previously only the bare @when form was detected.
  • Behaviors.start() compiles the export module on main — the transpiler's rewritten module is now also instantiated as an in-memory types.ModuleType on the main thread (plus a linecache entry for traceback fidelity) so pump can resolve __behavior__N the same way workers do via their bootstrap.
  • Scheduler-owned behavior pre-headerbq_node and the new pinned OR-fold byte moved out of the opaque BOCBehavior into a scheduler-owned boc_behavior_prehdr_t allocated immediately before each behavior (CPython _PyGC_Head style). boc_sched.c no longer needs any knowledge of BOCBehavior's internal layout; layout drift between the scheduler and its users is impossible by construction.
  • terminator_wait_pumpable — new entry in boc_terminator.{c,h} lets the auto-pump loop wake on either count-zero or main-pinned-depth-becoming-non-zero, both wired through the existing single condition variable. Single-pumper enforcement on free-threaded builds (Py_GIL_DISABLED) lives alongside via a MAIN_PUMP_THREAD CAS that raises RuntimeError if a second thread tries to pump concurrently, cleared on every exit path including BaseException.

Bug Fixes

  • Transpiler except ... as X mis-classification~ast.ExceptHandler binds X on the handler node itself rather than via ~ast.Name Store, so the transpiler's free-variable walker mis-classified any read of X inside the handler body as a free variable, appended it as a behavior parameter, and emitted a call site that referenced an out-of-scope name. Fixed by a new visit_ExceptHandler hook that registers X as a local before recursing into the handler. Regression locked by TestCapturedLocals::test_except_as_name_excluded.

Documentation

  • New pinned_cowns page — concept and when to use, PinnedCown / pump / PumpResult / set_pump_watchdog / set_wait_pump_poll API, coarse-grained pinned-dispatch pattern, event-loop integration recipes (pyglet, Tk, asyncio), the queue-non-empty-time watchdog contract, free-threaded single-pumper rule, and free-threaded support trajectory. Linked from the root toctree.
  • api expanded with the new PinnedCown / pump / PumpResult / set_pump_watchdog / set_wait_pump_poll entries.
  • New "Talking to main-thread objects" subsection in the root README.md's "A taste of BOC" with a 10-line pyglet snippet illustrating the coarse-grained pattern; the public-API list picks up the five new symbols.
  • examples/README.md calls out the rewritten boids.py and the new examples/benchmark.py --pinned-spinner flag.

Tests

  • test/test_pinned_pump.py — new module covering the full PinnedCown / pump matrix: pure-pinned, mixed request sets, off-main construction rejection, locked error-string smoke tests, deadline_ms / max_behaviors bounding, body exceptions under default and raise_on_error=True, wait() auto-pump, shutdown drain via drop-exceptions, the watchdog warn-only and explicit-raise paths, the QUEUE_NONEMPTY_SINCE regression for unpinned-only workloads, hypothesis fuzz over mixed request sets, PinnedCown-handle round-trip through closure capture and through the noticeboard, Cown(PinnedCown) interop, and an acquire-failure fault-injection test that proves IN_PUMP_BODY / terminator_dec / MAIN_PUMP_THREAD cleanup runs on every exit path.
  • test/test_transpiler.py — 192 new lines covering the def b(c, i=i) loop-snapshot form, @when alias decorators, and the except ... as X regression.

Internal

  • examples/benchmark.py --pinned-spinner — high-rate pinned-dispatch overlay that adds one tail-recursing @when(PinnedCown) driven by pump(max_behaviors=1) on the main thread at a configurable rate while the existing chain-ring workload runs on workers. Used during development to verify worker-throughput regression under high-rate pinned dispatch; on CPython 3.14 at 4 workers / 10 s / 3 repeats the measured delta with the spinner active was −0.38%.
  • Noticeboard read contract tightenednoticeboard now explicitly documents that calling noticeboard or notice_read from the main thread outside a behavior is undefined behavior; the supported main-thread read path is wait(noticeboard=True). Seeding the noticeboard with notice_write from the main thread before scheduling any behavior remains supported.
  • test_matrix.TestVectorMethodsInCown migrated to the send("assert", ...) pattern — the in-cown Matrix vector tests previously asserted on result.value directly from the test thread, which violates the cown ownership contract. They now ship assertions out of each behavior via send("assert", ...) and collect on the test thread via a receive_asserts(count) helper, matching the project's BOC testing convention.

Closes #20

@matajoh matajoh force-pushed the main-pinned-cowns branch from 18c69af to ef6bde7 Compare June 6, 2026 10:38
Main-pinned cowns — a new `PinnedCown` subclass holds its
value as a plain `PyObject *` on the main interpreter, never
round-tripped through XIData. Behaviors whose request set contains
any pinned cown are routed by the scheduler to a single-consumer
main-thread queue and drained by the new `pump` entry point
(or implicitly by `wait`, which auto-pumps when pinned cowns
exist). Designed for objects that cannot survive cross-interpreter
shipping — pyglet shapes, Tk widgets, GPU contexts, open file
handles, ctypes pointers. The companion `examples/boids.py`
rewrite demonstrates the coarse-grained pinned-dispatch pattern:
per-cell physics stays on workers, and one `@when(PinnedCown)`
per frame batches the write-back into main-thread matrices.
Also in this release: `quiesce`, a non-tearing-down
checkpoint primitive.

**New Features**

- **`quiesce(timeout=None, *, stats=False, noticeboard=False)`** —
  blocks until every in-flight behavior completes, without tearing
  down workers or the noticeboard thread. Implemented via a new
  `terminator_seed_inc` peer of `terminator_seed_dec`
  (Pyrona-style seed-up / seed-down pairing) so quiescence becomes
  a *checkpoint* rather than a shutdown. Useful for parallel-search
  patterns that need to inspect a best-so-far cown between rounds
  and for tests that must read a worker-produced `send` queue
  before its producer interpreter is destroyed. The `stats` and
  `noticeboard` flags mirror `wait`: returns `None` by
  default, a per-worker stats `list[dict]` when `stats=True`,
  a noticeboard `dict[str, Any]` when `noticeboard=True`, or a
  `WaitResult` when both are set. Raises `TimeoutError`
  if quiescence is not reached within `timeout`. Exported from
  `bocpy.__all__`.
- **`PinnedCown(Cown[T])`** — a cown whose value lives
  permanently on the main interpreter. Constructible only from the
  main interpreter (raises `RuntimeError` from workers);
  the value is never picklable, never reified twice, and never
  reconstructed in a worker. The capsule *handle* remains a
  first-class cross-interpreter shareable — workers may hold it,
  embed it in a regular `Cown` value graph, and place it in
  noticeboard entries, but only the main thread may acquire the
  value. See the new `pinned_cowns` page for the full
  contract and the coarse-grained-dispatch pattern.
- **`pump(deadline_ms=None, max_behaviors=None, raise_on_error=False)`**
  — drains the main-thread queue of behaviors whose request sets
  contain a `PinnedCown`. Call from your event loop's
  idle / on-tick hook (pyglet `schedule_interval`, Tk `after`,
  asyncio task, …); script-mode programs need not call it
  explicitly because `wait` pumps internally. Non-preemptive:
  `deadline_ms` gates *starting* the next behavior, not
  interrupting one already running. Body exceptions default to
  landing on the result cown's `.exception`;
  `raise_on_error=True` re-raises the first body exception after
  drain. Returns a new `PumpResult` `NamedTuple`
  (`executed`, `deadline_reached`, `raised`).
- **`set_pump_watchdog(warn_ms=1000, raise_ms=None, on_starve=None)`**
  — configure the pinned-queue starvation watchdog. Both thresholds
  gate on **queue-non-empty time**, not raw last-pump time, so
  programs running only unpinned work never trip them. Default is
  warn-only; users opt into fail-fast via an explicit `raise_ms`
  so interactive debugger sessions are not wedged by a breakpoint.
- **`set_wait_pump_poll(ms=50)`** — set the poll cadence for
  `wait`'s auto-pump loop. Re-read every iteration so a
  concurrent call updates the active wait immediately.
- **`bocpy.PumpResult`** — three-field `NamedTuple` returned by
  `pump`. `executed` counts pinned behaviors whose lifecycle
  completed (including acquire-failure paths whose MCS chain still
  drained). `deadline_reached` is `True` only when the
  `deadline_ms` budget tripped before the queue drained.
  `raised` counts only body exceptions captured to a result cown
  (cleanup-path failures use `PyErr_WriteUnraisable` and do not
  count). Exported from `bocpy.__all__`.
- **Coarse-grained pinned-dispatch `examples/boids.py`** — the
  per-cell `send("update")` / main-thread `receive("update")`
  barrier is replaced by per-cell physics on workers plus one
  pinned `@when` per frame that captures every per-cell result
  cown together with the two main-thread `PinnedCown` matrices
  and performs the batched write-back. Same visual output, fully
  worker-parallel per-cell work, single main-thread touchpoint.

**Public C ABI**

- **`bocpy_main_interpid()`** — new `static inline` helper in
  `<bocpy/bocpy.h>` returning `PyInterpreterState_GetID(
  PyInterpreterState_Main())` pre-typed as `int_least64_t` to
  match `bocpy_interpid` for owner-field equality checks.
  Safe to call from a worker sub-interpreter for diagnostic /
  assert use. Additive — existing consumers recompile unchanged;
  `BOCPY_ABI` is unchanged at 1. The
  `templates/c_abi_consumer` `bocpy~=` pin moves to
  `~=0.9` to signal the new ABI surface it was authored against.

**Improvements**

- **`@when` loop-variable snapshot via default arg** — the
  transpiler now accepts `def b(c, i=i)` as an explicit
  loop-snapshot idiom in addition to the existing implicit form
  (just reference the loop variable in the body). Trailing
  positional parameters beyond the cown count are also
  auto-captured by name (`def b(c, factor)` captures
  `factor`).
- **`@when` alias decorators** — the transpiler now recognises
  `from bocpy import when as boc_when` and `import bocpy [as
  alias]` followed by `@bocpy.when(...)` or
  `@alias.when(...)`, provided the aliasing import is at module
  level. Previously only the bare `@when` form was detected.
- **`Behaviors.start()` compiles the export module on main** —
  the transpiler's rewritten module is now also instantiated as an
  in-memory `types.ModuleType` on the main thread (plus a
  `linecache` entry for traceback fidelity) so `pump` can
  resolve `__behavior__N` the same way workers do via their
  bootstrap.
- **Scheduler-owned behavior pre-header** — `bq_node` and the
  new `pinned` OR-fold byte moved out of the opaque
  `BOCBehavior` into a scheduler-owned `boc_behavior_prehdr_t`
  allocated immediately before each behavior (CPython
  `_PyGC_Head` style). `boc_sched.c` no longer needs any
  knowledge of `BOCBehavior`'s internal layout; layout drift
  between the scheduler and its users is impossible by
  construction.
- **`terminator_wait_pumpable`** — new entry in
  `boc_terminator.{c,h}` lets the auto-pump loop wake on either
  count-zero or main-pinned-depth-becoming-non-zero, both wired
  through the existing single condition variable. Single-pumper
  enforcement on free-threaded builds (`Py_GIL_DISABLED`) lives
  alongside via a `MAIN_PUMP_THREAD` CAS that raises
  `RuntimeError` if a second thread tries to pump
  concurrently, cleared on every exit path including
  `BaseException`.

**Bug Fixes**

- **CWE-401: inheriting INCREF leak in `cown_decref_inline`** —
  `CownCapsule_reduce` packs an encoded `XIData` payload by
  taking an *inheriting* `COWN_INCREF` per embedded
  `CownCapsule`, normally balanced when the bytes are
  unpickled inside a worker. On the orphan-death path (the
  consumer side never deserialised the payload) the matching
  `COWN_DECREF`s never fired and every embedded cown leaked.
  `cown_decref_inline` now feeds the encoded bytes through
  `pickle.loads` and immediately drops the result, which lets
  CPython's GC fire the matching `COWN_DECREF`s recursively.
  Gated on the `pickled` flag so native `XIData` round-trips
  (e.g. `Matrix`) skip the work entirely.
- **Main-pump behavior reference leak** — both
  `_core_main_pump_bounded` and `_core_main_pump_drain_all`
  popped a `BehaviorCapsule` from `MAIN_PINNED_QUEUE` but
  never released the strong reference the capsule held on the
  underlying `BOCBehavior`. Each pinned behavior leaked
  one reference until the runtime was torn down. The pump
  helpers now `BEHAVIOR_DECREF` the behavior immediately after
  the worker-equivalent cleanup runs.
- **MSVC `<stdatomic.h>` compatibility** — Microsoft's
  `<stdatomic.h>` (used by CPython's headers on Windows) does
  not expose the unsigned `atomic_uint_least64_t` or
  `atomic_uintptr_t` forms that the pinned-pump bookkeeping
  used. `MAIN_PINNED_DEPTH`, `MAIN_PINNED_NONEMPTY_SINCE_NS`,
  `LAST_PUMP_NS`, `WATCHDOG_WARN_MS`, `WATCHDOG_LAST_WARN_NS`,
  `WATCHDOG_ON_STARVE` and `MAIN_PUMP_THREAD` are now
  `atomic_int_least64_t` / `atomic_intptr_t`. Depth never
  goes negative; pointer bits round-trip losslessly through the
  signed atomic boundary.
- **CPython 3.10/3.11 `PyErr_SetRaisedException` polyfill** —
  added to `include/bocpy/xidata.h` alongside the existing
  `PyErr_GetRaisedException` polyfill so the public C ABI's
  exception-stash pattern compiles on Python versions before
  3.12. `BOCPY_ABI` is unchanged.
- **Portable `boc_max_align_t`** — added to `boc_compat.h` as
  a union of the most-strictly-aligned fundamental types
  (`long long`, `long double`, `void *`, function pointer).
  MSVC exposes the C11 `max_align_t` only under `/std:c11`,
  which the CPython build does not pass; the
  `boc_behavior_prehdr_t` size assertion now uses
  `alignof(boc_max_align_t)` so the alignment contract holds on
  every supported toolchain.
- **PEP 678 `add_note` 3.10 fallback** — the new
  `Behaviors.quiesce` exception-context shim attaches a note
  describing the seed-inc / seed-dec balance on failure. CPython
  3.10 predates `BaseException.add_note`; the shim now
  writes to `BaseException.__notes__` directly when `add_note`
  is missing.
- **Transpiler `except ... as X` mis-classification** —
  `ExceptHandler` binds `X` on the handler node
  itself rather than via `Name` `Store`, so the
  transpiler's free-variable walker mis-classified any read of
  `X` inside the handler body as a free variable, appended it
  as a behavior parameter, and emitted a call site that
  referenced an out-of-scope name. Fixed by a new
  `visit_ExceptHandler` hook that registers `X` as a local
  before recursing into the handler. Regression locked by
  `TestCapturedLocals::test_except_as_name_excluded`.

**Documentation**

- New `pinned_cowns` page — concept and when to use,
  `PinnedCown` / `pump` / `PumpResult` / `set_pump_watchdog`
  / `set_wait_pump_poll` API, coarse-grained pinned-dispatch
  pattern, event-loop integration recipes (pyglet, Tk, asyncio),
  the queue-non-empty-time watchdog contract, free-threaded
  single-pumper rule, and free-threaded support trajectory.
  Linked from the root toctree.
- `api` expanded with the new `PinnedCown` / `pump` /
  `PumpResult` / `set_pump_watchdog` / `set_wait_pump_poll`
  entries.
- New "Talking to main-thread objects" subsection in the root
  `README.md`'s "A taste of BOC" with a 10-line pyglet snippet
  illustrating the coarse-grained pattern; the public-API list
  picks up the five new symbols.
- `examples/README.md` calls out the rewritten `boids.py` and
  the new `examples/benchmark.py --pinned-spinner` flag.

**Tests**

- **`test/test_pinned_pump.py`** — new module covering the
  full `PinnedCown` / `pump` matrix: pure-pinned, mixed
  request sets, off-main construction rejection, locked
  error-string smoke tests, `deadline_ms` / `max_behaviors`
  bounding, body exceptions under default and
  `raise_on_error=True`, `wait()` auto-pump, shutdown drain
  via drop-exceptions, the watchdog warn-only and explicit-raise
  paths, the `QUEUE_NONEMPTY_SINCE` regression for unpinned-only
  workloads, hypothesis fuzz over mixed request sets,
  `PinnedCown`-handle round-trip through closure capture and
  through the noticeboard, `Cown(PinnedCown)` interop, and an
  acquire-failure fault-injection test that proves
  `IN_PUMP_BODY` / `terminator_dec` / `MAIN_PUMP_THREAD`
  cleanup runs on every exit path.
- **`test/test_transpiler.py`** — 192 new lines covering the
  `def b(c, i=i)` loop-snapshot form, `@when` alias decorators,
  and the `except ... as X` regression.
- **`test_main_pump_drain_all_marks_result_cowns` flaky-shutdown
  rewrite** — the original version scheduled eight pinned
  behaviors, called `wait(timeout=0)` to force shutdown, then
  asserted on the result cowns. The `timeout=0` propagated
  through every stage of `Behaviors.stop` (quiescence,
  noticeboard drain) and raised `TimeoutError` from one of
  them under load before the post-`wait` assertions could run.
  The rewritten test calls `_core.main_pump_drain_all` directly
  to exercise the shutdown drain in isolation and asserts every
  drained result cown carries the shutdown `RuntimeError`.

**Internal**

- **`examples/benchmark.py --pinned-spinner`** — high-rate
  pinned-dispatch overlay that adds one tail-recursing
  `@when(PinnedCown)` driven by `pump(max_behaviors=1)` on the
  main thread at a configurable rate while the existing chain-ring
  workload runs on workers. Used during development to verify
  worker-throughput regression under high-rate pinned dispatch;
  on CPython 3.14 at 4 workers / 10 s / 3 repeats the measured
  delta with the spinner active was −0.38%.
- **Noticeboard read contract tightened** — `noticeboard`
  now explicitly documents that calling `noticeboard` or
  `notice_read` from the main thread *outside* a behavior is
  undefined behavior; the supported main-thread read path is
  `wait(noticeboard=True)`. Seeding the noticeboard with
  `notice_write` from the main thread before scheduling any
  behavior remains supported.
- **`test_matrix.TestVectorMethodsInCown` migrated to the
  `send("assert", ...)` pattern** — the in-cown `Matrix` vector
  tests previously asserted on `result.value` directly from the
  test thread, which violates the cown ownership contract. They now
  ship assertions out of each behavior via `send("assert", ...)`
  and collect on the test thread via a `receive_asserts(count)`
  helper, matching the project's BOC testing convention.
- **CI: ASAN `detect_leaks=1`** — the pinned-pump leak hunt
  cleared the last masking leak; the ASAN job in
  `.github/workflows/pr_gate.yml` now sets
  `ASAN_OPTIONS=detect_leaks=1:halt_on_error=1` so any new
  reachable leak fails the build at the source instead of
  silently accumulating under `detect_leaks=0`.

Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>
@matajoh matajoh force-pushed the main-pinned-cowns branch from ef6bde7 to 63b3881 Compare June 6, 2026 10:38
@matajoh matajoh merged commit 6b9e1af into microsoft:main Jun 6, 2026
41 checks passed
@matajoh matajoh deleted the main-pinned-cowns branch June 6, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Main-pinned cowns — first-class affinity for non-shareable resources

1 participant