Guard _create_instance_func with _constructing_mutex on shared-global (macOS hot-reload) builds#1991
Closed
reforia wants to merge 1 commit into
Closed
Conversation
…obal builds On macOS hot-reload builds (`_GODOT_CPP_AVOID_THREAD_LOCAL`) the `_constructing_*` state is a shared global guarded by `Wrapped::_constructing_mutex` rather than `thread_local`. `memnew` takes that lock via `_pre_initialize()` and releases it in `_postinitialize()`. Since godotengine#1568 the 4.4+ create path uses manual `_set_construct_info()` + placement-new instead of `memnew`, so it writes the shared globals without taking the lock while `_postinitialize()` still unlocks it: an unguarded write plus an unbalanced unlock on the `recursive_mutex`. The macOS shared-global workaround (godotengine#1594) landed between godotengine#1568's authoring and merge, so the create path silently lost the lock `memnew` had been providing. Under concurrent off-main-thread construction (e.g. threaded resource load) this races: `_constructing_class_binding_callbacks` can be clobbered before the `Wrapped` ctor consumes it, hitting `CRASH_NOW_MSG("...created without binding callbacks...")`, and the stray `unlock()` tears down a critical section held by another thread (downstream instance-binding / TypedArray corruption). It is intermittent (~50%), macOS-only, and only on threaded loads; release/export builds use real `thread_local` and are unaffected. Lock before `_set_construct_info()` on the 4.4+ path and release in the no-`p_notify_postinitialize` branch to keep the recursive mutex balanced, mirroring `_pre_initialize()`/`_postinitialize()`. Non-macOS and release builds compile out both blocks. `_recreate_instance_func` already uses a `lock_guard`. Fixes godotengine#1990. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: reforia <11989463+reforia@users.noreply.github.com>
Collaborator
|
I've attempted a different approach at fixing this issue in PR #1992 It makes sure we don't have an stray |
Collaborator
|
Superseded by #1992 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
On macOS editor/debug builds with hot reload (
MACOS_ENABLED && HOT_RELOAD_ENABLED, i.e._GODOT_CPP_AVOID_THREAD_LOCAL),ClassDB::_create_instance_funcraces on the shared construction state and performs an unbalanced mutex unlock, causing intermittent (~50%)SIGABRTduring off-main-thread instantiation (e.g.ResourceLoader.load_threaded_request()of a GDExtension resource, or any construction on aWorkerThreadPoolthread):…often followed by downstream instance-binding /
TypedArraytype corruption.Full analysis in #1990.
Root cause
On this build the
_constructing_*state is a shared global guarded byWrapped::_constructing_mutex, notthread_local. The locking contract is:_pre_initialize<T>()(thememnewpath) locks the mutex, then sets the construct info.Wrapped::Wrapped(const StringName &)consumes the globals and nulls them._postinitialize()unlocks the mutex.Since #1568 the 4.4+ create path uses manual
_set_construct_info<T>()+ placement-new instead ofmemnew(intentionally —p_notify_postinitializelets the engine defer post-init, whichmemnewcan't express). But that change dropped the lockmemnewwas providing via_pre_initialize, while_postinitialize()still unlocks. The macOS shared-global workaround (#1594) landed between #1568's authoring and its merge, so the gap went unnoticed.Result, under concurrency:
_set_construct_info<T>()writes the shared globals with no lock; a concurrent construction can clobber_constructing_class_binding_callbacksbefore theWrappedctor consumes it →nullptr→CRASH_NOW_MSG._postinitialize()unlocks arecursive_mutexthis path never locked (UB), tearing down a critical section held by another thread._recreate_instance_funcis unaffected — it already uses an explicitstd::lock_guard.Fix
Mirror
_pre_initialize/_postinitializeon the 4.4+ path: lock before_set_construct_info, and balance the unlock in the deferred (no-p_notify_postinitialize) branch. The pre-4.4 path keepsmemnew(which locks itself). Non-macOS and release builds compile out both#ifdefblocks — no behavior change. Nestedmemnewduring aTconstructor stays safe because the mutex is recursive.Why the symptoms match
load()never crashes → no interleaving, and a strayunlock()on an uncontendedrecursive_mutexis benign.MACOS_ENABLED && HOT_RELOAD_ENABLEDonly; release/export templates use realthread_local.Testing
Builds clean against the bundled test extension on the affected config:
which instantiates
_create_instance_funcon the_GODOT_CPP_AVOID_THREAD_LOCALpath.Fixes #1990.