Skip to content

[Fix] propagate init_function errors from all MPI ranks in block allocation mode#1023

Closed
jan-janssen wants to merge 3 commits into
mainfrom
fix/init-function-mpi-parallel-block-allocation
Closed

[Fix] propagate init_function errors from all MPI ranks in block allocation mode#1023
jan-janssen wants to merge 3 commits into
mainfrom
fix/init-function-mpi-parallel-block-allocation

Conversation

@jan-janssen

@jan-janssen jan-janssen commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

  • In interactive_parallel.py, the init_function branch only reported errors from MPI rank 0; failures on non-zero ranks were silently swallowed, leaving those ranks with uninitialised memory. Subsequent function calls on the affected ranks would then receive wrong or missing kwargs injected from memory.
  • The fix gathers errors from all ranks via MPI.COMM_WORLD.gather before rank 0 sends the success/error response to the scheduler — mirroring how function-execution results are already gathered. This also acts as an implicit barrier so the scheduler cannot dispatch the next task until every rank has finished initialising.
  • Adds test_internal_memory_mpi to cover block_allocation + cores=2 + init_function, a combination that had zero test coverage.

Test plan

  • Existing test test_internal_memory (cores=1) continues to pass
  • New test test_internal_memory_mpi (cores=2) passes and verifies that both MPI ranks receive the memory value set by the init function
  • Full test suite green

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced error handling during initialization to capture errors across distributed execution ranks and provide clearer error reporting.
  • Tests

    • Added test coverage for memory allocation behavior in parallel execution environments.

…cation mode

In interactive_parallel.py the init branch was only propagating errors
from rank 0; failures on non-zero ranks were silently swallowed, leaving
those ranks with uninitialised memory. Subsequent function calls on the
affected ranks would then receive wrong or missing kwargs.

The fix mirrors the existing function-execution path: after each rank
runs call_funct for init, all errors are gathered to rank 0 via
MPI.COMM_WORLD.gather before the success/error response is sent back to
the scheduler. This also acts as an implicit barrier so the scheduler
cannot dispatch the next task until every rank has finished init.

Adds a test (test_internal_memory_mpi) that exercises block allocation
with cores=2 and an init_function – a combination that had zero coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@jan-janssen, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 25 minutes and 10 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 911eeadf-d364-4fb7-bc73-75d21bef6094

📥 Commits

Reviewing files that changed from the base of the PR and between b173e16 and 7f56240.

📒 Files selected for processing (1)
  • src/executorlib/backend/interactive_parallel.py
📝 Walkthrough

Walkthrough

The "init" handling in interactive_parallel.py is refactored to catch exceptions into init_error, gather that variable across all MPI ranks, and have rank zero select and forward the first non-None error via ZMQ. A new unit test (test_internal_memory_mpi) validates init_function memory sharing with cores=2 using mpi4py.

Changes

MPI Init Error Propagation

Layer / File(s) Summary
Gather init errors across MPI ranks and report on rank zero
src/executorlib/backend/interactive_parallel.py, tests/unit/standalone/interactive/test_spawner.py
The "init" path now catches exceptions into init_error, gathers it across ranks (or wraps in a list for single-rank runs), and rank zero picks the first non-None error to send via ZMQ and write the error file; the new test_internal_memory_mpi test, gated on mpi4py, verifies init_function with cores=2 returns two matching NumPy arrays.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • pyiron/executorlib#804: Introduced the "fail safe init function" logic in the same "init" handling path of interactive_parallel.py that this PR extends with cross-rank error gathering.

Poem

🐇 Across the MPI ranks I hop,
Collecting errors, none shall drop.
Rank zero picks the first mistake,
And sends it back for the caller's sake.
No init error slips away—
The rabbit gathers them all today! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: propagating init_function errors from all MPI ranks in block allocation mode, which matches the core objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/init-function-mpi-parallel-block-allocation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.84%. Comparing base (640e440) to head (7f56240).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/executorlib/backend/interactive_parallel.py 0.00% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1023      +/-   ##
==========================================
- Coverage   94.24%   93.84%   -0.40%     
==========================================
  Files          39       39              
  Lines        2119     2128       +9     
==========================================
  Hits         1997     1997              
- Misses        122      131       +9     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jan-janssen and others added 2 commits June 19, 2026 07:01
…nts)

Moves the init-function handling out of main() into a private helper so
the statement count stays within the ruff/pylint PLR0915 limit of 50.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jan-janssen jan-janssen marked this pull request as draft June 19, 2026 05:23
@jan-janssen jan-janssen deleted the fix/init-function-mpi-parallel-block-allocation branch June 19, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant