Releases: dstackai/dstack
0.20.26
Server
SSH pool
The server now keeps a pool of reusable SSH connections to instances, enabled by default. Previously, the server opened a fresh SSH connection for every operation against an instance. Reusing pooled connections removes this per-operation overhead and delivers a significant performance boost — runs, dev environments, and services become noticeably more responsive, especially on servers managing many instances.
The SSH pool is on by default and requires no configuration. If needed, you can opt out by setting the DSTACK_SERVER_SSH_POOL_DISABLED environment variable:
DSTACK_SERVER_SSH_POOL_DISABLED=1
Faster run listing
The /api/runs/list endpoint has been optimized to load jobs more efficiently. Listing runs — including in the UI and via dstack ps — is now faster, particularly for projects with a large number of runs.
Backends
AWS
Capacity Reservations
dstack now applies the tenancy of an EC2 On-Demand Capacity Reservation when launching instances into it. Because a Capacity Reservation only accepts instances whose attributes — instance type, platform, Availability Zone, and tenancy — match the reservation, this ensures instances with a dedicated tenancy reservation launch correctly instead of being rejected.
What's changed
- Enable server SSH pool by default by @r4victor in #3981
- Use
uv pip install ipykernelfor dev environments by @r4victor in #3982 - Refactor/shared replica tunnel by @Bihan in #3978
- Document the
instancesrun configuration property by @peterschmidt85 in #3989 - Fix Azure backend with azure-mgmt-resource 26 by @peterschmidt85 in #3988
- Optimize
/api/runs/list jobloading by @peterschmidt85 in #3986 - Apply Capacity Reservation tenancy to AWS instance launch by @james-boydell in #3992
Full changelog: 0.20.25...0.20.26
0.20.25
Runs
Ubuntu 24.04
dstack's base Docker images have been upgraded from Ubuntu 22.04 to Ubuntu 24.04. This means runs are now executed in the Ubuntu 24.04 containers unless image is specified. See the Ubuntu 24.04 LTS release notes for more details.
Note: If your runs hard-depend on the previous Ubuntu version, specify image in the run configuration explicitly:
type: task
image: dstackai/base:0.13-base-ubuntu22.04
commands: ...Instances
Run configurations now support instances property that allows provisioning runs only on the specified instances:
type: dev-environment
ide: vscode
instances: [{fleet: my-ssh-fleet, instance: 0}]This can be useful if, for example, a run depends on an Instance Volume that exists on a specific SSH instance.
See the reference for different syntax options supported by instances.
Gateways
Replicas
A gateway can now have multiple replicas for improved availability and scalability:
type: gateway
name: example-gateway
backend: aws
region: eu-west-1
domain: example.com
certificate: null
replicas: 2To balance requests between gateway replicas, add DNS records for each replica or set up a load balancer outside of dstack.
Note: Automatic HTTPS is not supported for replicated gateways. Use an external load balancer for TLS termination.
Replicated gateways are an experimental feature. See the docs for all the limitations.
Backends
AWS
NVIDIA B200 and B300
dstack now supports AWS p6-b200 and p6-b300 instance types, with max-throughput EFA networking setup out-of-the-box. p6-b300 is the first instance type natively supported by dstack that comes with NVIDIA Blackwell Ultra B300 GPUs and 6,400 Gbps networking. Give it a try:
✗ dstack apply -f b300-fleet.dstack.yml
...
# BACKEND REGION INSTANCE RESOURCES SPOT PRICE
1 aws us-east-1 p6-b300.48xlarge cpu=192 mem=4096GB disk=100GB gpu=B300:268.6GB:8 yes $33.082
2 aws us-west-2 p6-b300.48xlarge cpu=192 mem=4096GB disk=100GB gpu=B300:268.6GB:8 yes $34.4876
3 aws us-west-2 p6-b300.48xlarge cpu=192 mem=4096GB disk=100GB gpu=B300:268.6GB:8 no $142.416
...
Shown 3 of 4 offers, $142.416 max
What's changed
- Support AWS p6 instances by @r4victor in #3961
- Support targeting specific instances by @peterschmidt85 in #3958
- Support Zed in the UI: add to IDE dropdown and fix Open-in-IDE link by @peterschmidt85 in #3963
- Separate Docker and VM base image versions by @r4victor in #3966
- [Docs]: Tenant isolation guide by @jvstme in #3913
- [Nebius]: Update OS image and add new platforms by @jvstme in #3970
- Update nvidia drivers installation in VM images by @r4victor in #3967
- [Docs]: Revise the SSH proxy section by @peterschmidt85 in #3965
- Add Docker Compose for a Postgres-backed server with SSH proxy by @peterschmidt85 in #3964
- Support gateways with multiple replicas by @jvstme in #3960
- Fix runtime error with grpcio by @Bihan in #3971
- Drop special handling of the Sky gateway by @jvstme in #3974
- Add script to manage dstack AWS AMIs by @r4victor in #3976
- Update docker base image to ubuntu 24.04 by @r4victor in #3972
- Bump base image versions to 0.14 by @r4victor in #3977
- Improve replicated gateway display in older CLIs by @jvstme in #3975
- Replace lsblk fs detection with blkid by @r4victor in #3979
Full changelog: 0.20.24...0.20.25
0.20.25rc1
Instances
Run configurations now support the instances property for targeting specific fleet instances.
When instances is set, the run is placed only on matching existing fleet instances. If the specified instances cannot be used, the run fails instead of provisioning new instances.
Target by instance name:
instances:
- name: my-fleet-0The short syntax is an instance name string:
instances:
- my-fleet-0Target by hostname or IP address:
instances:
- hostname: 10.0.1.42Target by fleet and instance number:
instances:
- fleet: my-fleet
instance: 0For fleets from another project, use the <project>/<fleet> reference:
instances:
- fleet: shared-project/my-fleet
instance: 0Multiple instances can be specified:
instances:
- my-fleet-0
- my-fleet-1What's changed
- Support targeting specific instances by @peterschmidt85 and @fededagos in #3958
Full changelog: 0.20.24...0.20.25rc1
Note
The public documentation will be updated when the release becomes GA.
0.20.24
Dev environments
Zed
dstack now supports Zed as a dev environment IDE:
type: dev-environment
ide: zed
resources:
gpu: L4Once the dev environment is up, the CLI prints a zed:// link that opens the remote project in Zed over SSH. Since Zed doesn't require any plugins, no server pre-installation is needed — the Zed server is installed automatically on first connect.
✗ dstack apply
...
Submit a new run? [y/n]: y
NAME BACKEND GPU PRICE STATUS SUBMITTED
fast-fly-1 aws (us-east-2) gpu=L4:24GB:1 $0.1838 running 16:36
(spot)
fast-fly-1 provisioning completed (running)
pip install ipykernel...
To open in Zed, use link below:
zed://ssh/fast-fly-1/dstack/run
To connect via SSH, use: `ssh fast-fly-1`
To exit, press Ctrl+C.
Services
Replica groups
The spot_policy and reservation properties can now be specified at the replica group level. This allows distributing replicas across reserved and spot capacity, e.g., running baseline replicas on a reservation while autoscaling overflow replicas on spot instances:
type: service
image: my-image
port: 80
replicas:
- name: baseline
reservation: my-reservation
count: 1
- name: overflow
spot_policy: auto
count: 0..3
scaling:
metric: rps
target: 1Shepherd Model Gateway
Services using Shepherd Model Gateway now support gRPC communication with both vLLM and SGLang workers. Previously, only the SGLang runtime with the HTTP connection mode was supported.
Below is an example service configuration running vLLM gRPC workers:
type: service
name: prefill-decode
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
image: python:3.12-slim
commands:
- pip install smg
- |
smg launch \
--pd-disaggregation \
--model-path $MODEL_ID \
--enable-igw \
--host 0.0.0.0 \
--port 8000 \
--prefill-policy cache_aware
router:
type: sglang
resources:
cpu: 4
- count: 1
image: vllm/vllm-openai:latest
commands:
- pip install -U "vllm[grpc]"
- |
python3 -m vllm.entrypoints.grpc_server \
--model $MODEL_ID \
--host 0.0.0.0 \
--port 8000 \
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_producer"}'
resources:
gpu: H200
- count: 1
image: vllm/vllm-openai:latest
commands:
- pip install -U "vllm[grpc]"
- |
python3 -m vllm.entrypoints.grpc_server \
--model $MODEL_ID \
--host 0.0.0.0 \
--port 8000 \
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_consumer"}'
resources:
gpu: H200
port: 8000dstack automatically detects each worker's runtime (vLLM or SGLang) and connection mode (HTTP or gRPC) by probing it. With gRPC, the SMG router tokenizes requests once and routes on tokens instead of raw text, reducing duplicate work and making cache_aware routing more effective.
JarvisLabs
The jarvislabs backend now supports offers with RTXPRO6000 GPUs.
Azure
subnet_ids
Similarly to vpc_ids, the azure backend now allows selecting specific subnets to be attached to dstack VMs via the new subnet_ids property, mapping regions to subnets in the <resource-group>/<vnet>/<subnet> format:
projects:
- name: main
backends:
- type: azure
subscription_id: ...
tenant_id: ...
creds:
type: default
regions: [westeurope]
subnet_ids:
westeurope: my-resource-group/my-vnet/my-subnetThis is useful when the VNet contains subnets that dstack shouldn't pick automatically, e.g. subnets delegated to other Azure services.
What's changed
- Fix zero scaled services assigned to wrong fleets by @r4victor in #3939
- Set runner/shim default compiled versions to
latestby @r4victor in #3941 - Implement SSH connection pool for runner instances by @r4victor in #3936
- [chore]: Move
format_backend()to common utils by @jvstme in #3942 - Drop non-linux runner builds and local backend by @r4victor in #3944
- Support Zed as dev-environment IDE by @r4victor in #3947
- Fix dropping ssh connections to non-provisioned terminating instances by @r4victor in #3948
- Replica group
spot_policyandreservationby @jvstme in #3932 - Fix jpd.hostname AssertionError on container stop by @r4victor in #3951
- Add NVIDIA Dynamo blog post by @peterschmidt85 in #3949
- Support gRPC communication with SMG (Shepherd Model Gateway) workers by @Bihan in #3946
- Allow configuring
subnet_idsin Azure settings by @jvstme in #3955 - [JarvisLabs] Support RTX PRO 6000; update gpuhunt dependency by @peterschmidt85 in #3943
Full changelog: 0.20.23...0.20.24
0.20.23
This release includes several bug fixes and performance optimizations.
What's Changed
- [Internal]: Fix OCI image publishing script by @jvstme in #3915
- Update Docker and cloud images to 0.13 by @jvstme in #3916
- [shim] Pass proxy variables to the container by @un-def in #3917
- Fix image pull progress when reported in seconds by @jvstme in #3921
- Skip getting backend offers when instance offers suffice by @r4victor in #3923
- Reduce run provisioning pipeline processing latency by @r4victor in #3922
- Do not generate RSA key for runner sshd by @r4victor in #3926
- Handle repo patch with non-UTF8 sequences by @un-def in #3918
- Fix Verda spot offers marked unavailable due to on-demand-only availability check by @IA386 in #3928
New Contributors
Full Changelog: 0.20.22...0.20.23
0.20.22
Accelerators
Tenstorrent
The update adds support for Tenstorrent Blackhole accelerators, including PCIe cards and systems such as LoudBox, QuietBox, and Galaxy. Previously dstack supported only Tenstorrent Wormhole accelerators. Also, we've reworked the Tenstorrent example.
Backends
Vast.ai
The vastai backend gets new backend-specific options in run and fleet configurations for advanced offers filtering:
type: dev-environment
backend_options:
- type: vastai
offer_order: price
min_reliability: 0.97
min_score: 250See the YAML reference for more details on new backend_options.
Examples
Miles
A new Miles example shows how to use dstack and Miles for reinforcement learning (RL) post-training of a 32B language model with GRPO across a multi-node cluster.
Breaking changes
- Dropped support for AWS P3 instances (V100).
What's changed
- [Docs]Add AMD Mi300x PD-Disaggregation Example by @Bihan in #3890
- Display imported gateways in project settings UI by @jvstme in #3893
- [chore]: Refactor
get_job_plans()by @jvstme in #3894 - Update TT-SMI Docker image build by @peterschmidt85 in #3900
- Discover and use the latest AWS Ubuntu 22.04 DLAMI by @jvstme in #3899
- Drop AWS P3 support and use DLAMI for all AWS GPU instances by @r4victor in #3903
- Fix placement groups missing project attribute by @r4victor in #3905
- Improve AMD accelerator example by @peterschmidt85 in #3901
- [Docs]Add Miles Example by @Bihan in #3907
- Add Vast.ai-specific profile options by @jvstme in #3909
- Do not pass minCudaVersion for RunPod clusters by @r4victor in #3911
- Add Tenstorrent Blackhole support by @peterschmidt85 in #3895
- [Internal]: Drop unused pre-pull parameter in images CI by @jvstme in #3912
- Fix Vast.ai offer order in
dstack offer --fleetby @jvstme in #3897
Full changelog: 0.20.21...0.20.22
0.20.21
Backends
JarvisLabs
This release adds JarvisLabs as a new backend, allowing dstack to provision GPU and CPU VMs on JarvisLabs, including spot GPU instances.
To configure the backend, log into your JarvisLabs account, create an API key, and add it to ~/.dstack/server/config.yml:
projects:
- name: main
backends:
- type: jarvislabs
creds:
type: api_key
api_key: ...Kubernetes
Multiple clusters
A single kubernetes backend can now manage multiple Kubernetes clusters. Each cluster is selected via a kubeconfig context and becomes its own dstack region:
projects:
- name: main
backends:
- type: kubernetes
kubeconfig:
filename: ~/.kube/config
contexts:
- name: gpu-cluster-a
- name: gpu-cluster-bEach context can configure its own proxy_jump.hostname and proxy_jump.port, and the namespace is taken from each kubeconfig context. When creating a dstack volume or gateway, the region field selects which cluster the resource is provisioned in.
The previous single-cluster configuration (without contexts) continues to work but is no longer recommended and may be removed in the future. Refer to the backends docs for the up-to-date configuration and migration guidance.
Object labeling
All dstack-managed Kubernetes resources (jump pods, job pods, gateways, volumes, registry-auth secrets, services) now share a consistent set of labels, making it easier to filter and audit dstack resources with kubectl:
app.kubernetes.io/name=dstack-{ssh-proxy,job,gateway,volume}app.kubernetes.io/instanceapp.kubernetes.io/managed-by=dstackk8s.dstack.ai/projectk8s.dstack.ai/name(if applicable)k8s.dstack.ai/user(if applicable)
Bug fixes
- Jobs no longer retry indefinitely when the target fleet is at capacity.
- Negative
retry.durationvalues (e.g.-1) are now rejected during configuration parsing instead of silently producing a nonsensical retry spec.
What's changed
- Fix Kubernetes backend
utils.pytyping by @un-def in #3889 - [CI] Bump pyright-action by @un-def in #3888
- Reject negative retry durations by @pragnyanramtha in #3885
- Fix infinite job retry when fleet is at capacity by @jvstme in #3887
- Kubernetes: multiple clusters support by @un-def in #3884
- Add JarvisLabs backend by @peterschmidt85 in #3875
- Kubernetes: standardize object labeling by @un-def in #3891
- [Docs] Fix
gen_schema_reference.pyon Python 3.10 by @un-def in #3883
New contributors
- @pragnyanramtha made their first contribution in #3885
Full changelog: 0.20.20...0.20.21
0.20.20
Services
NVIDIA Dynamo
This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.
Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.
type: service
name: dynamo-pd
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
docker: true
commands:
- apt-get update
- apt-get install -y python3-dev python3-venv
- python3 -m venv ~/dyn-venv
- source ~/dyn-venv/bin/activate
- pip install -U pip
- pip install "ai-dynamo[sglang]==1.1.1"
- git clone https://github.com/ai-dynamo/dynamo.git
# Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
- docker compose -f dynamo/deploy/docker-compose.yml up -d
- |
python3 -m dynamo.frontend \
--http-host 0.0.0.0 --http-port 8000 \
--discovery-backend etcd --router-mode kv \
--kv-cache-block-size 64
resources:
cpu: 4
router:
type: dynamo
- count: 1..4
scaling:
metric: rps
target: 3
python: "3.12"
nvcc: true
commands:
# dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
# is provisioned. Compose the etcd/NATS endpoints from it.
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
# Set to enable /health endpoint required by dstack probes.
- export DYN_SYSTEM_PORT="8000"
# Wait until the router's etcd and NATS ports are actually accepting connections.
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode prefill --disaggregation-transfer-backend nixl
resources:
gpu: H200
- count: 1..8
scaling:
metric: rps
target: 2
python: "3.12"
nvcc: true
commands:
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
- export DYN_SYSTEM_PORT="8000"
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode decode --disaggregation-transfer-backend nixl
resources:
gpu: H200
port: 8000
model: zai-org/GLM-4.5-Air-FP8
# Custom probe is required for PD disaggregation.
probes:
- type: http
url: /health
interval: 15sdstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.
Refer to the Dynamo example for full deployment instructions.
Replica groups
It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.
Exports
Gateways
Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.
$ dstack export --project main create my-export --gateway shared-gateway --importer team
NAME FLEETS GATEWAYS IMPORTERS
my-export - shared-gateway team Now, if you list gateways in the team project, you'll see the exported gateway:
$ dstack gateway --project team
NAME BACKEND HOSTNAME DOMAIN DEFAULT STATUS
main/shared-gateway aws (eu-west-1) 108.131.126.35 gtw.mycompany.example runningAdditionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.
type: gateway
name: shared-gateway
backend: aws
region: eu-west-1
domain: ${{ run.project_name }}.mycompany.exampleGlobal exports
Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.
$ dstack export create global-export --gateway shared-gateway --global
NAME FLEETS GATEWAYS IMPORTERS
global-export - shared-gateway *AWS
EFA clusters
Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.
Kubernetes
Permissions
dstack now requires the watch permission for pods within the namespace. See Required permissions for up-to-date ClusterRole and Role manifests.
Backend configuration
The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.
Migration guide
Migration guide
- If
namespaceis unset or set todefaultin both the backend config and the kubeconfig, no action is required —defaultcontinues to be used. - If
namespaceis set to the same value (e.g.ns-a) in both the backend config and the kubeconfig, no action is required. - If
namespaceis set tons-ain the backend config but the kubeconfig has a different value (or none), set the namespace tons-ain your kubeconfig context to prepare for future versions. - It is only safe to remove
namespacefrom the backend config if its value isdefault.
What's changed
- [Services] Allow to specify
image,docker,python,nvcc,privilegedat replica group level by @Bihan in #3832 - [Internal]: Delete some unused classes by @jvstme in #3842
- [Internal] Fix
pyrightfailing in CI by @jvstme in #3846 - [Internal] Update
RunpodApiClientby @un-def in #3847 - [Internal] Fix
openaiSDK failing in tests by @jvstme in #3849 - [RunPod] Handle deleting non-existent volume by @r4victor in #3853
- [Runpod] Fix broken
registry_authsupport by @un-def in #3844 - [UX] Raise
ImportErroron Python 3.14 or later by @r4victor in #3855 - [Exports] Gateway support by @jvstme in #3845
- [Internal] Rename
docs/tomkdocs/, move examples under/docs/, inline source by @peterschmidt85 in #3859 - [Kubernetes] Deprecate
namespacein backend config by @un-def in #3858 - [Gateways] Allow setting imported gateway as project default by @jvstme in #3860
- [Internal] Forbid exporting the built-in
dstackSky gateway by @jvstme in #3864 - [AWS] Support multi-EFA instances with public IPs by @r4victor in #3865
- [Internal] Add server-side validation for fleet configuration subtypes by @un-def in #3848
- [Verda] Optimize terminating Verda instances by @jvstme in #3811
- [Internal] Introduce
GatewayModel.forbid_new_servicesby @jvstme in #3863 - [Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in #3869
- [Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in #3866
- Fix Pyright errors with
requests==2.34.0by @jvstme in #3873 - Add project name interpolation in gateway domains by @jvstme in #3870
- [Bugfix] Fix duplicate headers with in-server proxy by @jvstme in #3872
- [Docs]: Gateway Exports by @jvstme in #3862
- [Kubernetes] Fai...
0.20.19
Services
RPS window for autoscaling
Services now support a window property in the scaling spec that defines the time window used to calculate RPS. Allowed values are 30s, 1m, and 5m (default is 1m). Previously, the RPS was always calculated using a 1m window.
type: service
image: nginx
port: 80
replicas: 0..1
scaling:
metric: rps
# 1 request per second, calculated over a 5-minute window
target: 1
window: 5mKubernetes
registry_auth
The kubernetes backend now supports the registry_auth property for pulling Docker images from private registries:
type: service
image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
registry_auth:
username: $oauthtoken
password: ${{ secrets.ngc_api_key }}dstack automatically creates and sets up imagePullSecrets for the pods. This requires new permissions for the Kubernetes role:
rules:
resources: ["secrets"]
verbs: ["create", "delete"]Read-only volumes
Kubernetes volume configurations now support a new read_only property. When set to true, it enforces readOnly: true in the pod's volumeMounts.
type: volume
backend: kubernetes
name: my-volume
size: 100GB
read_only: trueServer
Faster processing
The server has been optimized to reduce processing latencies. As a result, many operations now take less time: run provisioning is up to 14s faster and run termination is up to 7s faster.
Examples
Documentation and examples have been refreshed, including a new Qwen3.6-27B and DeepSeek V4 examples. A new prefill-decode blog post shows how to run SGLang PD disaggregation via Shepherd Model Gateway.
Breaking changes
Python 3.9 support dropped
Running dstack on Python 3.9 is no longer supported, as Python 3.9 reached end-of-life on 2025-10-31. Please upgrade to Python 3.10 or later.
What's Changed
- Refresh quickstart and service docs with Qwen3.6-27B by @peterschmidt85 in #3819
- Disallow running
dstackon Python 3.9 by @jvstme in #3817 - Create placeholder instance models by @r4victor in #3821
- Add DeepSeek V4 model docs by @peterschmidt85 in #3823
- Reduce pipelines processing latencies by @r4victor in #3828
- [Docs]: Update
scale_up/down_delaydescriptions by @jvstme in #3831 - Clean up exports on project and fleet deletion by @jvstme in #3827
- [shim,runner] Improve logging options by @un-def in #3822
- Allow configuring RPS window for service scaling by @jvstme in #3830
- Replace sglang_router with smg in PD examples by @Bihan in #3836
- Interpolate JobSpec secrets for Compute.run_job() by @un-def in #3834
- Kubernetes: configure
imagePullSecretsby @un-def in #3835 - Kubernetes: add
read_onlyvolume property by @un-def in #3838
Full Changelog: 0.20.18...0.20.19
0.20.18
CLI
For VM-based backends as well as SSH fleets, the CLI now shows Docker image pull progress in the format <extracted>/<downloaded>/<total>.
Offers
This update reduces the time required to fetch backend offers and initialize backends, making both dstack offer and dstack apply faster:
- runpod — 0.66s => 0.03s (22x)
- amddevcloud — 2.26s => 0.85s (2.7x)
- cudo — 2.48s => 1.02s (2.4x)
- verda — 3.27s => 1.74s (1.9x)
- lambda — 3.24s => 1.89s (1.7x)
- vastai — 3.27s => 1.77s (1.8x)
- gcp — 3.74s => 2.54s (1.5x)
- azure — 5.83s => 3.11s (1.9x)
- aws — 6.58s => 3.56s (1.8x)
Secrets
The Manager project role can now manage secrets if the allow_managers_manage_secrets property is enabled in the server’s default_permissions config:
default_permissions:
allow_managers_manage_secrets: truePreviously, only the Admin role was allowed to manage secrets.
GPUs
This update adds support for GeForce RTX 2, 3, 4, and 5 series GPUs, which were previously not detected properly across both backend and SSH fleets.
GCP
The gcp backend now requires the compute.projects.get permission. Make sure this permission is granted to any custom IAM roles used by dstack.
What's changed
- Optimize GCP offers by @r4victor in #3793
- Optimize InstanceOffer construction by @r4victor in #3794
- Speed up GCP validate_credentials by @r4victor in #3795
- Support secrets management by Manager role by @r4victor in #3801
- Fix
update_default_project()crash on server without TTY by @un-def in #3797 - Kubernetes: fix
is_hard_taintcheck by @un-def in #3803 - Fix deleting idle instance from fleet with runs by @jvstme in #3807
- [Docs] Update examples by @peterschmidt85 in #3798
- Display image pull progress in CLI by @jvstme in #3805
- [Docs] Add an inline
kubeconfigexample to thekubernetesbackend documentation by @peterschmidt85 in #3813 - Avoid Verda instance termination warnings by @jvstme in #3810
- [Internal] Improve warning message in
ServerConfigManager.apply_config()by @un-def in #3804 - Add missing join to volumes query in JobSubmittedWorker by @un-def in #3816
- Add CLI deprecation warnings about gateway routers by @jvstme in #3814
- Bump
gpuhunt, add support for all GeForce RTX 2..5 series by @un-def in #3818 - Add misssing
compute.projects.getGCP permission by @un-def in #3820
Full changelog: 0.20.17...0.20.18