Open
Conversation
Previously, every object allocation in rb_gc_impl_new_obj made a per-object FFI call into Rust (mmtk_add_obj_free_candidate), which acquired a mutex on one of the WeakProcessor's candidate vecs, pushed a single element, and released the mutex. That's an FFI crossing + mutex lock/unlock on every single allocation. Now, each MMTk_ractor_cache has two local buffers (parallel-freeable and non-parallel-freeable, 128 entries each). On allocation, we just store the pointer into the local buffer. When a buffer fills up, we flush the entire batch in one FFI call using mmtk_add_obj_free_candidates, which does a single mutex acquisition and extend_from_slice for the whole batch. We picked 128 as our buffer size at random. We should probably investigate further what an optimum size for this is
e444c58 to
23c4a9a
Compare
shutdown_call_finalizer reads candidates from the Rust-side WeakProcessor, but the main ractor's C-side buffer may not have been flushed yet (ractor_cache_free runs later). Flush all remaining buffers before reading candidates.
Instead of sending all 128 buffered objects to one bucket, round-robin distribute them across all worker buckets so parallel obj_free work stays balanced.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, every object allocation in rb_gc_impl_new_obj made a per-object FFI call into Rust (mmtk_add_obj_free_candidate), which acquired a mutex on one of the WeakProcessor's candidate vecs, pushed a single element, and released the mutex. That's an FFI crossing + mutex lock/unlock on every single allocation.
Now, each MMTk_ractor_cache has two local buffers (parallel-freeable and non-parallel-freeable, 128 entries each). On allocation, we just store the pointer into the local buffer. When a buffer fills up, we flush the entire batch in one FFI call using mmtk_add_obj_free_candidates, which does a single mutex acquisition and flushes the batch into the work buckets. The objects are still distributed in teh same way, but now we only take a lock once per queue buffer, rather than per-object.
We picked 128 as our buffer size at random. We should probably investigate further what an optimum size for this is