Skip to content

bug: Batch API result file download fails for large outputs (>200MB) with ConnectionResetError #2959

@xwang049

Description

@xwang049

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Describe the bug
When using the Batch API to download large result files (specifically when the output .jsonl exceeds ~200-300MB, e.g., 50k embedding rows), the download connection is prematurely closed by the peer.

The error typically manifests as:
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (received X bytes, expected Y bytes)

To Reproduce
Create a Batch job with 50,000 embedding requests (text-embedding-3-small).

Wait for the batch to complete (status: completed).

Attempt to download the result file using the SDK:

Python

Standard SDK approach that fails

content = client.batches.retrieve_content(batch.output_file_id)
content.with_streaming_response.content()
Or using the requests library directly with the file URL.

The download will consistently fail after receiving a few hundred megabytes.

Expected behavior
The SDK should handle large file streaming robustly, or provide a built-in chunked download/retry mechanism for massive Batch outputs (1GB+).

Environment
OS: macOS 15.x

Python Version: 3.12.x

Additional context
Workaround: I've confirmed that splitting the 50k requests into smaller batches of 10k (resulting in ~200MB files) allows for stable downloads. This suggests a potential timeout or buffer limitation on the server-side proxy or within the SDK's streaming implementation for payloads exceeding a certain threshold.

I am currently implementing a manual chunked-download helper with exponential backoff to bypass this. but i dont think it;s optimal

To Reproduce

To Reproduce:

  1. Prepare a batch with ~50,000 rows (near the OpenAI limit)
  2. Submit to Batch API with endpoint /v1/embeddings
  3. Wait for status = completed
  4. Attempt to download result file via Files API Code Snippets:

sample code:

from openai import OpenAI
client = OpenAI()

Result file is ~1GB when batch has 50,000 embeddings

result_content = client.files.content(result_file_id)
result_text = result_content.text # fails silently or raises

#failed on streaming :
with client.files.with_streaming_response.content(result_file_id) as resp:
with open("result.jsonl", "wb") as f:
for chunk in resp.iter_bytes(chunk_size=1024 * 1024):
f.write(chunk)

raises: peer closed connection without sending complete message body

(received 940216419 bytes, expected 1056098116)

#same for requests:
import requests
response = requests.get(
f"https://api.openai.com/v1/files/{result_file_id}/content",
headers={"Authorization": f"Bearer {api_key}"},
stream=True,
timeout=(10, 300)
)

same error, connection drops around 500-1000MB

Code snippets

OS

macOS

Python version

Python Version: 3.12.x

Library version

2.24.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions