[Fix] propagate init_function errors from all MPI ranks in block allocation mode#1023
[Fix] propagate init_function errors from all MPI ranks in block allocation mode#1023jan-janssen wants to merge 3 commits into
Conversation
…cation mode In interactive_parallel.py the init branch was only propagating errors from rank 0; failures on non-zero ranks were silently swallowed, leaving those ranks with uninitialised memory. Subsequent function calls on the affected ranks would then receive wrong or missing kwargs. The fix mirrors the existing function-execution path: after each rank runs call_funct for init, all errors are gathered to rank 0 via MPI.COMM_WORLD.gather before the success/error response is sent back to the scheduler. This also acts as an implicit barrier so the scheduler cannot dispatch the next task until every rank has finished init. Adds a test (test_internal_memory_mpi) that exercises block allocation with cores=2 and an init_function – a combination that had zero coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 25 minutes and 10 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe ChangesMPI Init Error Propagation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1023 +/- ##
==========================================
- Coverage 94.24% 93.84% -0.40%
==========================================
Files 39 39
Lines 2119 2128 +9
==========================================
Hits 1997 1997
- Misses 122 131 +9 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…nts) Moves the init-function handling out of main() into a private helper so the statement count stays within the ruff/pylint PLR0915 limit of 50. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
interactive_parallel.py, theinit_functionbranch only reported errors from MPI rank 0; failures on non-zero ranks were silently swallowed, leaving those ranks with uninitialisedmemory. Subsequent function calls on the affected ranks would then receive wrong or missing kwargs injected from memory.MPI.COMM_WORLD.gatherbefore rank 0 sends the success/error response to the scheduler — mirroring how function-execution results are already gathered. This also acts as an implicit barrier so the scheduler cannot dispatch the next task until every rank has finished initialising.test_internal_memory_mpito coverblock_allocation + cores=2 + init_function, a combination that had zero test coverage.Test plan
test_internal_memory(cores=1) continues to passtest_internal_memory_mpi(cores=2) passes and verifies that both MPI ranks receive the memory value set by the init function🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests