RFC: Proactive Server-Side Cancellation via `Request-Timeout-Ms`

### Confirm this is a feature request for the Python library and not the underlying OpenAI API.

- [x] This is a feature request for the Python library

### Describe the feature or improvement you're requesting

Hey, I'm writing on behalf on Baseten,

When a client times out, the server has no idea, and chaining the cancellation in services, e.g. http1 has canvats. It keeps doing expensive work (inference, token generation) that nobody will ever receive. The only signal today is a TCP disconnect — reactive, not proactive. In theory/future, this could also be adopted to e.g. issue a timeout on server side if work is unrealistic to be completed within that time. 

## Proposal
Send a `Request-Timeout-Ms` header on every request so nginx, Go contexts, and load balancers can cancel in-flight work proactively when the deadline elapses — no client disconnect needed, or cancellation chain can work proactivley. Internal services can also convert this into `Request-Deadline-Ms` a ms since unix epoch time, which allows server side verification in distributed systems.

## Why not just use `x-stainless-read-timeout`?
`timeout.read` is a **per-chunk silence threshold**, not a wall-clock budget. It resets on every received chunk, so it's the wrong value to drive server-side cancellation — a healthy long running stream would get killed incorrectly.

What's a valid value?
Only a plain `float` timeout (e.g. `OpenAI(timeout=20.0)`) is a true wall-clock budget for e2e time. `httpx.Timeout` objects have no equivalent field. We should **not** send the header for those — worse than no header, as we could cancel the work on server side for this..

## Proposed Implementation
```python
# _build_headers(), _base_client.py
if "request-timeout-ms" not in lower_custom_headers:
    timeout = self.timeout if isinstance(options.timeout, NotGiven) else options.timeout
    if not isinstance(timeout, Timeout) and timeout is not None:
        headers["request-timeout-ms"] = str(int(timeout * 1000))
```

## Prior Art
- **gRPC:** [`grpc-timeout`](https://grpc.io/docs/guides/deadlines/) propagates deadlines e2e across all services — the canonical example of this pattern. Middleware can decreatse that
- **Envoy:** [`x-envoy-upstream-rq-timeout-ms`](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#x-envoy-upstream-rq-timeout-ms) — exact same semantics, widely adopted in service meshes. Unfortunately not very cross-vendor agnostic. 
- **Google Maps/Cloud API:** `X-Server-Timeout` used for deadline propagation - unfortunately in seconds, not milliseconds.
- **Stainless SDKs:** Already send `x-stainless-read-timeout` for observability — this builds on that foundation with correct cancellation semantics.

It would be great to have a vendor agnostic name, that could be adopted from a range of LLM projects. The stainless OpenAI API is IMO the best proxy. I think having a header we can rely on would help us save a ton of compute - i believe. Please don't make the header contain `openai` or `stainless`. 

### Additional context

-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Proactive Server-Side Cancellation via `Request-Timeout-Ms` #3277

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Proposal

Why not just use `x-stainless-read-timeout`?

Proposed Implementation

Prior Art

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: Proactive Server-Side Cancellation via Request-Timeout-Ms #3277

Description

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Proposal

Why not just use x-stainless-read-timeout?

Proposed Implementation

Prior Art

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

RFC: Proactive Server-Side Cancellation via `Request-Timeout-Ms` #3277

Why not just use `x-stainless-read-timeout`?