Skip to content

fix(ompi): actually call MPI_Finalize in FinalizeImpl#10

Merged
Ziminli merged 1 commit into
InfiniTensor:masterfrom
GordonYang1:fix/ompi-mpi-finalize
May 19, 2026
Merged

fix(ompi): actually call MPI_Finalize in FinalizeImpl#10
Ziminli merged 1 commit into
InfiniTensor:masterfrom
GordonYang1:fix/ompi-mpi-finalize

Conversation

@GordonYang1
Copy link
Copy Markdown
Collaborator

@GordonYang1 GordonYang1 commented May 18, 2026

Summary

This PR fixes an issue in FinalizeImpl<BackendType::kOmpi, ...>::Apply
where the MPI runtime was never actually shut down.

Previously, the implementation called MPI_Finalized(&finalized) to query
whether MPI had already been finalized, but never invoked MPI_Finalize()
itself. As a result:

  • The MPI runtime was left in an initialized state at process exit
  • MPI_Init_thread allocations were not properly released
  • On some OpenMPI builds this produced shutdown warnings or non-zero exit
    status when the process tore down

Changes

  • Complete the Finalize Path
    • In src/ompi/impl/finalize.h, after querying MPI_Finalized, invoke
      MPI_Finalize() when MPI has been initialized but not yet finalized
    • Preserve the existing INFINI_CHECK_MPI error-propagation pattern so
      that a non-MPI_SUCCESS return is reported through the standard
      OpenMPI check macro
    • No behavior change when MPI was already finalized by user code or by a
      nested call — the guard makes the call idempotent

Known Issues & Future Work

  • The base CommInitAll allocates a Communicator via new, but the
    current CommDestroy only clears the inner backend instance and does
    not delete the outer object. That separate lifetime issue is out of
    scope for this PR.

Test Environment

  • 4× NVIDIA GPUs (3× H100 PCIe + 1× A100 SXM)
  • CUDA 11.8, GCC 11, OpenMPI 4.x (conda environment)
  • Build flags: -DWITH_NVIDIA=ON -DWITH_OMPI=ON

Verified that the AllReduce example exits cleanly with no MPI shutdown
warnings; without this fix the same binary leaves MPI in an unfinalized
state at exit.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ec2514ac5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/ompi/impl/finalize.h
Comment on lines +16 to +17
if (!finalized) {
INFINI_CHECK_MPI(MPI_Finalize());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid finalizing MPI when this library did not initialize it

FinalizeImpl<BackendType::kOmpi>::Apply now always calls MPI_Finalize() whenever MPI is not yet finalized, but InitImpl explicitly skips MPI_Init_thread when MPI was already initialized by the host process (src/ompi/impl/init.h, if (!initialized)). In that common embedding scenario (framework/app owns MPI lifecycle), this change tears down global MPI ownership from inside InfiniCCL, so any subsequent MPI call in the host can fail or abort. Finalization should be gated by whether InfiniCCL performed initialization, not just by MPI_Finalized().

Useful? React with 👍 / 👎.

@GordonYang1 GordonYang1 changed the title fix(ompi): call MPI_Finalize during finalize fix(ompi): actually call MPI_Finalize in FinalizeImpl May 18, 2026
@Ziminli Ziminli self-requested a review May 19, 2026 02:20
@Ziminli Ziminli merged commit 75b2e6f into InfiniTensor:master May 19, 2026
@GordonYang1 GordonYang1 deleted the fix/ompi-mpi-finalize branch May 20, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants