Distributed AI inference for geo-aware edge compute.
Run production workloads with lower cost and lower latency — horizontal scale across edge devices, one familiar API surface.
- Lower cost — Inference on idle edge compute instead of always renting centralized GPU capacity.
- Geo-aware routing — Requests land on nearby capacity so latency stays predictable for real users.
- Edge-native models — Nano language models (NLMs) tuned for edge and cloud, not only downsized cloud stacks.
- OpenAI-compatible API —
POST /v1/responsesandPOST /v1/chat/completionswith request shapes you already know. - Cloud fallback — When edge is unavailable, the same API path falls back to cloud without a second integration story.
- Typed SDKs — Official clients on npm (
zerogpu-api) and PyPI (pip install zerogpu-api→import zerogpu), plus Go, Ruby, Java, Rust, C#, PHP, and Swift in the SDK monorepo.
| Base URL | https://api.zerogpu.ai/v1 |
| Primary path | POST /v1/responses |
| Headers | x-api-key, x-project-id, Content-Type: application/json |
| Reference | Responses API |
Set ZEROGPU_API_KEY and ZEROGPU_PROJECT_ID the same way you do in the platform dashboard snippets. Full authentication, models, and error semantics live in docs.zerogpu.ai.
| Repository | What you’ll find there |
|---|---|
| zerogpu/SDK | Official Fern-generated API clients, smoke tests, and publishing workflows for npm/PyPI packages. |
| zerogpu/docs | Documentation source and deep links into docs.zerogpu.ai. |
| zerogpu/zerogpu-router | Router-related components that pair with how traffic and capacity are orchestrated. |
ZeroGPU — inference where your users are, not only where the GPUs are.
