bug: Batch API result file download fails for large outputs (>200MB) with ConnectionResetError

### Confirm this is an issue with the Python library and not an underlying OpenAI API

- [x] This is an issue with the Python library

### Describe the bug

Describe the bug
When using the Batch API to download large result files (specifically when the output .jsonl exceeds ~200-300MB, e.g., 50k embedding rows), the download connection is prematurely closed by the peer.

The error typically manifests as:
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (received X bytes, expected Y bytes)

To Reproduce
Create a Batch job with 50,000 embedding requests (text-embedding-3-small).

Wait for the batch to complete (status: completed).

Attempt to download the result file using the SDK:

Python
# Standard SDK approach that fails
content = client.batches.retrieve_content(batch.output_file_id)
content.with_streaming_response.content() 
Or using the requests library directly with the file URL.

The download will consistently fail after receiving a few hundred megabytes.

Expected behavior
The SDK should handle large file streaming robustly, or provide a built-in chunked download/retry mechanism for massive Batch outputs (1GB+).

Environment
OS: macOS 15.x

Python Version: 3.12.x

Additional context
Workaround: I've confirmed that splitting the 50k requests into smaller batches of 10k (resulting in ~200MB files) allows for stable downloads. This suggests a potential timeout or buffer limitation on the server-side proxy or within the SDK's streaming implementation for payloads exceeding a certain threshold.

I am currently implementing a manual chunked-download helper with exponential backoff to bypass this. but i dont think it;s optimal 

### To Reproduce

To Reproduce:                                                                                                                                                             
                                                                                                                                                                          
  1. Prepare a batch with ~50,000 rows (near the OpenAI limit)                                                                                                              
  2. Submit to Batch API with endpoint /v1/embeddings                                                                                                                       
  3. Wait for status = completed                                                                                                                                            
  4. Attempt to download result file via Files API          Code Snippets:

 # sample code：
  from openai import OpenAI
  client = OpenAI()

  # Result file is ~1GB when batch has 50,000 embeddings
  result_content = client.files.content(result_file_id)
  result_text = result_content.text  # fails silently or raises

 #failed on streaming ：
  with client.files.with_streaming_response.content(result_file_id) as resp:
      with open("result.jsonl", "wb") as f:
          for chunk in resp.iter_bytes(chunk_size=1024 * 1024):
              f.write(chunk)
  # raises: peer closed connection without sending complete message body
  # (received 940216419 bytes, expected 1056098116)

#same for requests：
  import requests
  response = requests.get(
      f"https://api.openai.com/v1/files/{result_file_id}/content",
      headers={"Authorization": f"Bearer {api_key}"},
      stream=True,
      timeout=(10, 300)
  )
  # same error, connection drops around 500-1000MB


### Code snippets

```Python

```

### OS

macOS

### Python version

Python Version: 3.12.x

### Library version

2.24.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Batch API result file download fails for large outputs (>200MB) with ConnectionResetError #2959

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Standard SDK approach that fails

To Reproduce

sample code：

Result file is ~1GB when batch has 50,000 embeddings

raises: peer closed connection without sending complete message body

(received 940216419 bytes, expected 1056098116)

same error, connection drops around 500-1000MB

Code snippets

OS

Python version

Library version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Batch API result file download fails for large outputs (>200MB) with ConnectionResetError #2959

Description

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Standard SDK approach that fails

To Reproduce

sample code：

Result file is ~1GB when batch has 50,000 embeddings

raises: peer closed connection without sending complete message body

(received 940216419 bytes, expected 1056098116)

same error, connection drops around 500-1000MB

Code snippets

OS

Python version

Library version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions