25 Jun 19:29

jvstme

5e01dff

0.20.26 Latest

Latest

Server

SSH pool

The server now keeps a pool of reusable SSH connections to instances, enabled by default. Previously, the server opened a fresh SSH connection for every operation against an instance. Reusing pooled connections removes this per-operation overhead and delivers a significant performance boost — runs, dev environments, and services become noticeably more responsive, especially on servers managing many instances.

The SSH pool is on by default and requires no configuration. If needed, you can opt out by setting the DSTACK_SERVER_SSH_POOL_DISABLED environment variable:

DSTACK_SERVER_SSH_POOL_DISABLED=1

Faster run listing

The /api/runs/list endpoint has been optimized to load jobs more efficiently. Listing runs — including in the UI and via dstack ps — is now faster, particularly for projects with a large number of runs.

Backends

AWS

Capacity Reservations

dstack now applies the tenancy of an EC2 On-Demand Capacity Reservation when launching instances into it. Because a Capacity Reservation only accepts instances whose attributes — instance type, platform, Availability Zone, and tenancy — match the reservation, this ensures instances with a dedicated tenancy reservation launch correctly instead of being rejected.

What's changed

Enable server SSH pool by default by @r4victor in #3981
Use uv pip install ipykernel for dev environments by @r4victor in #3982
Refactor/shared replica tunnel by @Bihan in #3978
Document the instances run configuration property by @peterschmidt85 in #3989
Fix Azure backend with azure-mgmt-resource 26 by @peterschmidt85 in #3988
Optimize /api/runs/list job loading by @peterschmidt85 in #3986
Apply Capacity Reservation tenancy to AWS instance launch by @james-boydell in #3992

Full changelog: 0.20.25...0.20.26

Contributors

Bihan, r4victor, and 2 other contributors

Assets 2

18 Jun 11:12

r4victor

0.20.25

9c1e460

0.20.25

Runs

Ubuntu 24.04

dstack's base Docker images have been upgraded from Ubuntu 22.04 to Ubuntu 24.04. This means runs are now executed in the Ubuntu 24.04 containers unless image is specified. See the Ubuntu 24.04 LTS release notes for more details.

Note: If your runs hard-depend on the previous Ubuntu version, specify image in the run configuration explicitly:

type: task
image: dstackai/base:0.13-base-ubuntu22.04
commands: ...

Instances

Run configurations now support instances property that allows provisioning runs only on the specified instances:

type: dev-environment
ide: vscode
instances: [{fleet: my-ssh-fleet, instance: 0}]

This can be useful if, for example, a run depends on an Instance Volume that exists on a specific SSH instance.

See the reference for different syntax options supported by instances.

Gateways

Replicas

A gateway can now have multiple replicas for improved availability and scalability:

type: gateway
name: example-gateway

backend: aws
region: eu-west-1

domain: example.com

certificate: null
replicas: 2

To balance requests between gateway replicas, add DNS records for each replica or set up a load balancer outside of dstack.

Note: Automatic HTTPS is not supported for replicated gateways. Use an external load balancer for TLS termination.

Replicated gateways are an experimental feature. See the docs for all the limitations.

Backends

AWS

NVIDIA B200 and B300

dstack now supports AWS p6-b200 and p6-b300 instance types, with max-throughput EFA networking setup out-of-the-box. p6-b300 is the first instance type natively supported by dstack that comes with NVIDIA Blackwell Ultra B300 GPUs and 6,400 Gbps networking. Give it a try:

✗ dstack apply -f b300-fleet.dstack.yml
...
 #  BACKEND  REGION     INSTANCE          RESOURCES                                         SPOT  PRICE
 1  aws      us-east-1  p6-b300.48xlarge  cpu=192 mem=4096GB disk=100GB gpu=B300:268.6GB:8  yes   $33.082
 2  aws      us-west-2  p6-b300.48xlarge  cpu=192 mem=4096GB disk=100GB gpu=B300:268.6GB:8  yes   $34.4876
 3  aws      us-west-2  p6-b300.48xlarge  cpu=192 mem=4096GB disk=100GB gpu=B300:268.6GB:8  no    $142.416
    ...
 Shown 3 of 4 offers, $142.416 max

What's changed

Support AWS p6 instances by @r4victor in #3961
Support targeting specific instances by @peterschmidt85 in #3958
Support Zed in the UI: add to IDE dropdown and fix Open-in-IDE link by @peterschmidt85 in #3963
Separate Docker and VM base image versions by @r4victor in #3966
[Docs]: Tenant isolation guide by @jvstme in #3913
[Nebius]: Update OS image and add new platforms by @jvstme in #3970
Update nvidia drivers installation in VM images by @r4victor in #3967
[Docs]: Revise the SSH proxy section by @peterschmidt85 in #3965
Add Docker Compose for a Postgres-backed server with SSH proxy by @peterschmidt85 in #3964
Support gateways with multiple replicas by @jvstme in #3960
Fix runtime error with grpcio by @Bihan in #3971
Drop special handling of the Sky gateway by @jvstme in #3974
Add script to manage dstack AWS AMIs by @r4victor in #3976
Update docker base image to ubuntu 24.04 by @r4victor in #3972
Bump base image versions to 0.14 by @r4victor in #3977
Improve replicated gateway display in older CLIs by @jvstme in #3975
Replace lsblk fs detection with blkid by @r4victor in #3979

Full changelog: 0.20.24...0.20.25

Contributors

Bihan, r4victor, and 2 other contributors

Assets 2

12 Jun 15:38

peterschmidt85

0.20.25rc1

c1d9fa9

0.20.25rc1 Pre-release

Pre-release

Instances

Run configurations now support the instances property for targeting specific fleet instances.

When instances is set, the run is placed only on matching existing fleet instances. If the specified instances cannot be used, the run fails instead of provisioning new instances.

Target by instance name:

instances:
  - name: my-fleet-0

The short syntax is an instance name string:

instances:
  - my-fleet-0

Target by hostname or IP address:

instances:
  - hostname: 10.0.1.42

Target by fleet and instance number:

instances:
  - fleet: my-fleet
    instance: 0

For fleets from another project, use the <project>/<fleet> reference:

instances:
  - fleet: shared-project/my-fleet
    instance: 0

Multiple instances can be specified:

instances:
  - my-fleet-0
  - my-fleet-1

What's changed

Support targeting specific instances by @peterschmidt85 and @fededagos in #3958

Full changelog: 0.20.24...0.20.25rc1

Note

The public documentation will be updated when the release becomes GA.

Contributors

fededagos and peterschmidt85

Assets 2

11 Jun 13:55

peterschmidt85

0.20.24

cd0e93c

0.20.24

Dev environments

Zed

dstack now supports Zed as a dev environment IDE:

type: dev-environment
ide: zed
resources:
  gpu: L4

Once the dev environment is up, the CLI prints a zed:// link that opens the remote project in Zed over SSH. Since Zed doesn't require any plugins, no server pre-installation is needed — the Zed server is installed automatically on first connect.

✗ dstack apply
...
Submit a new run? [y/n]: y
 NAME                     BACKEND                  GPU                     PRICE       STATUS      SUBMITTED
 fast-fly-1               aws (us-east-2)          gpu=L4:24GB:1           $0.1838     running     16:36
                                                                           (spot)

fast-fly-1 provisioning completed (running)
pip install ipykernel...

To open in Zed, use link below:

  zed://ssh/fast-fly-1/dstack/run

To connect via SSH, use: `ssh fast-fly-1`

To exit, press Ctrl+C.

Services

Replica groups

The spot_policy and reservation properties can now be specified at the replica group level. This allows distributing replicas across reserved and spot capacity, e.g., running baseline replicas on a reservation while autoscaling overflow replicas on spot instances:

type: service
image: my-image
port: 80

replicas:
  - name: baseline
    reservation: my-reservation
    count: 1

  - name: overflow
    spot_policy: auto
    count: 0..3
    scaling:
      metric: rps
      target: 1

Shepherd Model Gateway

Services using Shepherd Model Gateway now support gRPC communication with both vLLM and SGLang workers. Previously, only the SGLang runtime with the HTTP connection mode was supported.

Below is an example service configuration running vLLM gRPC workers:

type: service
name: prefill-decode

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    image: python:3.12-slim
    commands:
      - pip install smg
      - |
          smg launch \
            --pd-disaggregation \
            --model-path $MODEL_ID \
            --enable-igw \
            --host 0.0.0.0 \
            --port 8000 \
            --prefill-policy cache_aware
    router:
      type: sglang
    resources:
      cpu: 4

  - count: 1
    image: vllm/vllm-openai:latest
    commands:
      - pip install -U "vllm[grpc]"
      - |
          python3 -m vllm.entrypoints.grpc_server \
            --model $MODEL_ID \
            --host 0.0.0.0 \
            --port 8000 \
            --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_producer"}'
    resources:
      gpu: H200

  - count: 1
    image: vllm/vllm-openai:latest
    commands:
      - pip install -U "vllm[grpc]"
      - |
          python3 -m vllm.entrypoints.grpc_server \
            --model $MODEL_ID \
            --host 0.0.0.0 \
            --port 8000 \
            --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_consumer"}'
    resources:
      gpu: H200

port: 8000

dstack automatically detects each worker's runtime (vLLM or SGLang) and connection mode (HTTP or gRPC) by probing it. With gRPC, the SMG router tokenizes requests once and routes on tokens instead of raw text, reducing duplicate work and making cache_aware routing more effective.

JarvisLabs

The jarvislabs backend now supports offers with RTXPRO6000 GPUs.

Azure

`subnet_ids`

Similarly to vpc_ids, the azure backend now allows selecting specific subnets to be attached to dstack VMs via the new subnet_ids property, mapping regions to subnets in the <resource-group>/<vnet>/<subnet> format:

projects:
  - name: main
    backends:
      - type: azure
        subscription_id: ...
        tenant_id: ...
        creds:
          type: default
        regions: [westeurope]
        subnet_ids:
          westeurope: my-resource-group/my-vnet/my-subnet

This is useful when the VNet contains subnets that dstack shouldn't pick automatically, e.g. subnets delegated to other Azure services.

What's changed

Fix zero scaled services assigned to wrong fleets by @r4victor in #3939
Set runner/shim default compiled versions to latest by @r4victor in #3941
Implement SSH connection pool for runner instances by @r4victor in #3936
[chore]: Move format_backend() to common utils by @jvstme in #3942
Drop non-linux runner builds and local backend by @r4victor in #3944
Support Zed as dev-environment IDE by @r4victor in #3947
Fix dropping ssh connections to non-provisioned terminating instances by @r4victor in #3948
Replica group spot_policy and reservation by @jvstme in #3932
Fix jpd.hostname AssertionError on container stop by @r4victor in #3951
Add NVIDIA Dynamo blog post by @peterschmidt85 in #3949
Support gRPC communication with SMG (Shepherd Model Gateway) workers by @Bihan in #3946
Allow configuring subnet_ids in Azure settings by @jvstme in #3955
[JarvisLabs] Support RTX PRO 6000; update gpuhunt dependency by @peterschmidt85 in #3943

Full changelog: 0.20.23...0.20.24

Contributors

Bihan, r4victor, and 2 other contributors

Assets 2

04 Jun 10:20

jvstme

0.20.23

60bbb7e

0.20.23

This release includes several bug fixes and performance optimizations.

What's Changed

[Internal]: Fix OCI image publishing script by @jvstme in #3915
Update Docker and cloud images to 0.13 by @jvstme in #3916
[shim] Pass proxy variables to the container by @un-def in #3917
Fix image pull progress when reported in seconds by @jvstme in #3921
Skip getting backend offers when instance offers suffice by @r4victor in #3923
Reduce run provisioning pipeline processing latency by @r4victor in #3922
Do not generate RSA key for runner sshd by @r4victor in #3926
Handle repo patch with non-UTF8 sequences by @un-def in #3918
Fix Verda spot offers marked unavailable due to on-demand-only availability check by @IA386 in #3928

New Contributors

@IA386 made their first contribution in #3928

Full Changelog: 0.20.22...0.20.23

Contributors

un-def, r4victor, and 2 other contributors

Assets 2

28 May 10:24

r4victor

0.20.22

e27e573

0.20.22

Accelerators

Tenstorrent

The update adds support for Tenstorrent Blackhole accelerators, including PCIe cards and systems such as LoudBox, QuietBox, and Galaxy. Previously dstack supported only Tenstorrent Wormhole accelerators. Also, we've reworked the Tenstorrent example.

Backends

Vast.ai

The vastai backend gets new backend-specific options in run and fleet configurations for advanced offers filtering:

type: dev-environment
backend_options:
- type: vastai
  offer_order: price
  min_reliability: 0.97
  min_score: 250

See the YAML reference for more details on new backend_options.

Examples

Miles

A new Miles example shows how to use dstack and Miles for reinforcement learning (RL) post-training of a 32B language model with GRPO across a multi-node cluster.

Breaking changes

Dropped support for AWS P3 instances (V100).

What's changed

[Docs]Add AMD Mi300x PD-Disaggregation Example by @Bihan in #3890
Display imported gateways in project settings UI by @jvstme in #3893
[chore]: Refactor get_job_plans() by @jvstme in #3894
Update TT-SMI Docker image build by @peterschmidt85 in #3900
Discover and use the latest AWS Ubuntu 22.04 DLAMI by @jvstme in #3899
Drop AWS P3 support and use DLAMI for all AWS GPU instances by @r4victor in #3903
Fix placement groups missing project attribute by @r4victor in #3905
Improve AMD accelerator example by @peterschmidt85 in #3901
[Docs]Add Miles Example by @Bihan in #3907
Add Vast.ai-specific profile options by @jvstme in #3909
Do not pass minCudaVersion for RunPod clusters by @r4victor in #3911
Add Tenstorrent Blackhole support by @peterschmidt85 in #3895
[Internal]: Drop unused pre-pull parameter in images CI by @jvstme in #3912
Fix Vast.ai offer order in dstack offer --fleet by @jvstme in #3897

Full changelog: 0.20.21...0.20.22

Contributors

Bihan, r4victor, and 2 other contributors

Assets 2

21 May 12:43

un-def

0.20.21

ab4d4c0

0.20.21

Backends

JarvisLabs

This release adds JarvisLabs as a new backend, allowing dstack to provision GPU and CPU VMs on JarvisLabs, including spot GPU instances.

To configure the backend, log into your JarvisLabs account, create an API key, and add it to ~/.dstack/server/config.yml:

projects:
- name: main
  backends:
    - type: jarvislabs
      creds:
        type: api_key
        api_key: ...

Kubernetes

Multiple clusters

A single kubernetes backend can now manage multiple Kubernetes clusters. Each cluster is selected via a kubeconfig context and becomes its own dstack region:

projects:
- name: main
  backends:
  - type: kubernetes

    kubeconfig:
      filename: ~/.kube/config

    contexts:
    - name: gpu-cluster-a
    - name: gpu-cluster-b

Each context can configure its own proxy_jump.hostname and proxy_jump.port, and the namespace is taken from each kubeconfig context. When creating a dstack volume or gateway, the region field selects which cluster the resource is provisioned in.

The previous single-cluster configuration (without contexts) continues to work but is no longer recommended and may be removed in the future. Refer to the backends docs for the up-to-date configuration and migration guidance.

Object labeling

All dstack-managed Kubernetes resources (jump pods, job pods, gateways, volumes, registry-auth secrets, services) now share a consistent set of labels, making it easier to filter and audit dstack resources with kubectl:

app.kubernetes.io/name=dstack-{ssh-proxy,job,gateway,volume}
app.kubernetes.io/instance
app.kubernetes.io/managed-by=dstack
k8s.dstack.ai/project
k8s.dstack.ai/name (if applicable)
k8s.dstack.ai/user (if applicable)

Bug fixes

Jobs no longer retry indefinitely when the target fleet is at capacity.
Negative retry.duration values (e.g. -1) are now rejected during configuration parsing instead of silently producing a nonsensical retry spec.

What's changed

Fix Kubernetes backend utils.py typing by @un-def in #3889
[CI] Bump pyright-action by @un-def in #3888
Reject negative retry durations by @pragnyanramtha in #3885
Fix infinite job retry when fleet is at capacity by @jvstme in #3887
Kubernetes: multiple clusters support by @un-def in #3884
Add JarvisLabs backend by @peterschmidt85 in #3875
Kubernetes: standardize object labeling by @un-def in #3891
[Docs] Fix gen_schema_reference.py on Python 3.10 by @un-def in #3883

New contributors

@pragnyanramtha made their first contribution in #3885

Full changelog: 0.20.20...0.20.21

Contributors

un-def, jvstme, and 2 other contributors

Assets 2

15 May 11:45

jvstme

0.20.20

90c00cf

0.20.20

Services

NVIDIA Dynamo

This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.

Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.

type: service
name: dynamo-pd

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    docker: true
    commands:
      - apt-get update
      - apt-get install -y python3-dev python3-venv
      - python3 -m venv ~/dyn-venv
      - source ~/dyn-venv/bin/activate
      - pip install -U pip
      - pip install "ai-dynamo[sglang]==1.1.1"
      - git clone https://github.com/ai-dynamo/dynamo.git
      # Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
      - docker compose -f dynamo/deploy/docker-compose.yml up -d
      - |
        python3 -m dynamo.frontend \
          --http-host 0.0.0.0 --http-port 8000 \
          --discovery-backend etcd --router-mode kv \
          --kv-cache-block-size 64
    resources:
      cpu: 4
    router:
      type: dynamo

  - count: 1..4
    scaling:
      metric: rps
      target: 3
    python: "3.12"
    nvcc: true
    commands:
      # dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
      # is provisioned. Compose the etcd/NATS endpoints from it.
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      # Set to enable /health endpoint required by dstack probes.
      - export DYN_SYSTEM_PORT="8000"
      # Wait until the router's etcd and NATS ports are actually accepting connections.
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode prefill --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    python: "3.12"
    nvcc: true
    commands:
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_PORT="8000"
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode decode --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

# Custom probe is required for PD disaggregation.
probes:
  - type: http
    url: /health
    interval: 15s

dstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.

Refer to the Dynamo example for full deployment instructions.

Replica groups

It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.

Exports

Gateways

Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.

$ dstack export --project main create my-export --gateway shared-gateway --importer team
 NAME       FLEETS  GATEWAYS        IMPORTERS 
 my-export  -       shared-gateway  team

Now, if you list gateways in the team project, you'll see the exported gateway:

$ dstack gateway --project team
 NAME                 BACKEND          HOSTNAME        DOMAIN                 DEFAULT  STATUS  
 main/shared-gateway  aws (eu-west-1)  108.131.126.35  gtw.mycompany.example           running

Additionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.

type: gateway
name: shared-gateway

backend: aws
region: eu-west-1

domain: ${{ run.project_name }}.mycompany.example

Global exports

Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.

$ dstack export create global-export --gateway shared-gateway --global
 NAME           FLEETS  GATEWAYS        IMPORTERS
 global-export  -       shared-gateway  *

AWS

EFA clusters

Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.

Kubernetes

Permissions

dstack now requires the watch permission for pods within the namespace. See Required permissions for up-to-date ClusterRole and Role manifests.

Backend configuration

The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.

Migration guide

If namespace is unset or set to default in both the backend config and the kubeconfig, no action is required — default continues to be used.
If namespace is set to the same value (e.g. ns-a) in both the backend config and the kubeconfig, no action is required.
If namespace is set to ns-a in the backend config but the kubeconfig has a different value (or none), set the namespace to ns-a in your kubeconfig context to prepare for future versions.
It is only safe to remove namespace from the backend config if its value is default.

What's changed

[Services] Allow to specify image, docker, python, nvcc, privileged at replica group level by @Bihan in #3832
[Internal]: Delete some unused classes by @jvstme in #3842
[Internal] Fix pyright failing in CI by @jvstme in #3846
[Internal] Update RunpodApiClient by @un-def in #3847
[Internal] Fix openai SDK failing in tests by @jvstme in #3849
[RunPod] Handle deleting non-existent volume by @r4victor in #3853
[Runpod] Fix broken registry_auth support by @un-def in #3844
[UX] Raise ImportError on Python 3.14 or later by @r4victor in #3855
[Exports] Gateway support by @jvstme in #3845
[Internal] Rename docs/ to mkdocs/, move examples under /docs/, inline source by @peterschmidt85 in #3859
[Kubernetes] Deprecate namespace in backend config by @un-def in #3858
[Gateways] Allow setting imported gateway as project default by @jvstme in #3860
[Internal] Forbid exporting the built-in dstack Sky gateway by @jvstme in #3864
[AWS] Support multi-EFA instances with public IPs by @r4victor in #3865
[Internal] Add server-side validation for fleet configuration subtypes by @un-def in #3848
[Verda] Optimize terminating Verda instances by @jvstme in #3811
[Internal] Introduce GatewayModel.forbid_new_services by @jvstme in #3863
[Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in #3869
[Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in #3866
Fix Pyright errors with requests==2.34.0 by @jvstme in #3873
Add project name interpolation in gateway domains by @jvstme in #3870
[Bugfix] Fix duplicate headers with in-server proxy by @jvstme in #3872
[Docs]: Gateway Exports by @jvstme in #3862
[Kubernetes] Fai...

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

30 Apr 11:01

r4victor

0.20.19

9e23658

0.20.19

Services

RPS window for autoscaling

Services now support a window property in the scaling spec that defines the time window used to calculate RPS. Allowed values are 30s, 1m, and 5m (default is 1m). Previously, the RPS was always calculated using a 1m window.

type: service
image: nginx
port: 80

replicas: 0..1
scaling:
  metric: rps
  # 1 request per second, calculated over a 5-minute window
  target: 1
  window: 5m

Kubernetes

`registry_auth`

The kubernetes backend now supports the registry_auth property for pulling Docker images from private registries:

type: service
image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
registry_auth:
  username: $oauthtoken
  password: ${{ secrets.ngc_api_key }}

dstack automatically creates and sets up imagePullSecrets for the pods. This requires new permissions for the Kubernetes role:

rules:
  resources: ["secrets"]
  verbs: ["create", "delete"]

Read-only volumes

Kubernetes volume configurations now support a new read_only property. When set to true, it enforces readOnly: true in the pod's volumeMounts.

type: volume
backend: kubernetes
name: my-volume
size: 100GB
read_only: true

Server

Faster processing

The server has been optimized to reduce processing latencies. As a result, many operations now take less time: run provisioning is up to 14s faster and run termination is up to 7s faster.

Examples

Documentation and examples have been refreshed, including a new Qwen3.6-27B and DeepSeek V4 examples. A new prefill-decode blog post shows how to run SGLang PD disaggregation via Shepherd Model Gateway.

Breaking changes

Python 3.9 support dropped

Running dstack on Python 3.9 is no longer supported, as Python 3.9 reached end-of-life on 2025-10-31. Please upgrade to Python 3.10 or later.

What's Changed

Refresh quickstart and service docs with Qwen3.6-27B by @peterschmidt85 in #3819
Disallow running dstack on Python 3.9 by @jvstme in #3817
Create placeholder instance models by @r4victor in #3821
Add DeepSeek V4 model docs by @peterschmidt85 in #3823
Reduce pipelines processing latencies by @r4victor in #3828
[Docs]: Update scale_up/down_delay descriptions by @jvstme in #3831
Clean up exports on project and fleet deletion by @jvstme in #3827
[shim,runner] Improve logging options by @un-def in #3822
Allow configuring RPS window for service scaling by @jvstme in #3830
Replace sglang_router with smg in PD examples by @Bihan in #3836
Interpolate JobSpec secrets for Compute.run_job() by @un-def in #3834
Kubernetes: configure imagePullSecrets by @un-def in #3835
Kubernetes: add read_only volume property by @un-def in #3838

Full Changelog: 0.20.18...0.20.19

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

23 Apr 14:54

un-def

0.20.18

ad4b638

0.20.18

CLI

For VM-based backends as well as SSH fleets, the CLI now shows Docker image pull progress in the format <extracted>/<downloaded>/<total>.

Offers

This update reduces the time required to fetch backend offers and initialize backends, making both dstack offer and dstack apply faster:

- runpod — 0.66s => 0.03s (22x)
- amddevcloud — 2.26s => 0.85s (2.7x)
- cudo — 2.48s => 1.02s (2.4x)
- verda — 3.27s => 1.74s (1.9x)
- lambda — 3.24s => 1.89s (1.7x)
- vastai — 3.27s => 1.77s (1.8x)
- gcp — 3.74s => 2.54s (1.5x)
- azure — 5.83s => 3.11s (1.9x)
- aws — 6.58s => 3.56s (1.8x)

Secrets

The Manager project role can now manage secrets if the allow_managers_manage_secrets property is enabled in the server’s default_permissions config:

default_permissions:
  allow_managers_manage_secrets: true

Previously, only the Admin role was allowed to manage secrets.

GPUs

This update adds support for GeForce RTX 2, 3, 4, and 5 series GPUs, which were previously not detected properly across both backend and SSH fleets.

GCP

The gcp backend now requires the compute.projects.get permission. Make sure this permission is granted to any custom IAM roles used by dstack.

What's changed

Optimize GCP offers by @r4victor in #3793
Optimize InstanceOffer construction by @r4victor in #3794
Speed up GCP validate_credentials by @r4victor in #3795
Support secrets management by Manager role by @r4victor in #3801
Fix update_default_project() crash on server without TTY by @un-def in #3797
Kubernetes: fix is_hard_taint check by @un-def in #3803
Fix deleting idle instance from fleet with runs by @jvstme in #3807
[Docs] Update examples by @peterschmidt85 in #3798
Display image pull progress in CLI by @jvstme in #3805
[Docs] Add an inline kubeconfig example to the kubernetes backend documentation by @peterschmidt85 in #3813
Avoid Verda instance termination warnings by @jvstme in #3810
[Internal] Improve warning message in ServerConfigManager.apply_config() by @un-def in #3804
Add missing join to volumes query in JobSubmittedWorker by @un-def in #3816
Add CLI deprecation warnings about gateway routers by @jvstme in #3814
Bump gpuhunt, add support for all GeForce RTX 2..5 series by @un-def in #3818
Add misssing compute.projects.get GCP permission by @un-def in #3820

Full changelog: 0.20.17...0.20.18

Contributors

un-def, r4victor, and 2 other contributors

Assets 2

Uh oh!

Uh oh!

Releases: dstackai/dstack

0.20.26

Server

SSH pool

Faster run listing

Backends

AWS

Capacity Reservations

What's changed

Contributors

Uh oh!

0.20.25

Runs

Ubuntu 24.04

Instances

Gateways

Replicas

Backends

AWS

NVIDIA B200 and B300

What's changed

Contributors

Uh oh!

0.20.25rc1

Instances

What's changed

Contributors

Uh oh!

0.20.24

Dev environments

Zed

Services

Replica groups

Shepherd Model Gateway

JarvisLabs

Azure

subnet_ids

What's changed

Contributors

Uh oh!

0.20.23

What's Changed

New Contributors

Contributors

Uh oh!

0.20.22

Accelerators

Tenstorrent

Backends

Vast.ai

Examples

Miles

Breaking changes

What's changed

Contributors

Uh oh!

0.20.21

Backends

JarvisLabs

Kubernetes

Multiple clusters

Object labeling

Bug fixes

What's changed

New contributors

Contributors

Uh oh!

0.20.20

Services

NVIDIA Dynamo

Replica groups

Exports

Gateways

Global exports

AWS

EFA clusters

Kubernetes

Permissions

`subnet_ids`

`registry_auth`