Skip to content

fix: close file descriptor in deepspeed_io_handle_t::wait() to prevent fd leak#8075

Open
MarkCLChang wants to merge 1 commit into
deepspeedai:masterfrom
MarkCLChang:master
Open

fix: close file descriptor in deepspeed_io_handle_t::wait() to prevent fd leak#8075
MarkCLChang wants to merge 1 commit into
deepspeedai:masterfrom
MarkCLChang:master

Conversation

@MarkCLChang

@MarkCLChang MarkCLChang commented Jun 18, 2026

Copy link
Copy Markdown

Overview

This PR addresses a file descriptor leak in deepspeed_io_handle_t::wait() by ensuring the file descriptor is properly closed after the async I/O operation completes.

Changes

  • Added close() call on the file descriptor at the end of deepspeed_io_handle_t::wait() to prevent fd accumulation during repeated async I/O operations.
  • This prevents potential resource exhaustion in long-running training jobs that perform frequent checkpoint reads/writes via DeepSpeed's async I/O interface with ZeRO3 offload NVMe.

…t fd leak

Signed-off-by: markcl_chang <markcl_chang@adata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant