From 7c81f0eb9c2215d92dbfcf52639ba6e7fb1fd73c Mon Sep 17 00:00:00 2001 From: LSDan Date: Thu, 12 Mar 2026 15:25:20 +0000 Subject: [PATCH 1/2] docs: add self-host tutorial series from dstack-info Bring over 31 tutorial markdown files covering the full self-hosting workflow: TDX setup, host configuration, dstack installation, gateway deployment, blockchain integration, first app deployment, KMS CVM deployment, and troubleshooting guides. Co-Authored-By: Claude Opus 4.6 --- docs/tutorials/attestation-verification.md | 848 ++++++++++++++++++ docs/tutorials/blockchain-setup.md | 292 ++++++ docs/tutorials/clone-build-dstack-vmm.md | 157 ++++ docs/tutorials/contract-deployment.md | 256 ++++++ docs/tutorials/dns-configuration.md | 320 +++++++ docs/tutorials/docker-setup.md | 156 ++++ docs/tutorials/gateway-build-configuration.md | 648 +++++++++++++ docs/tutorials/gateway-service-setup.md | 517 +++++++++++ docs/tutorials/gramine-key-provider.md | 342 +++++++ docs/tutorials/guest-image-setup.md | 337 +++++++ docs/tutorials/haproxy-setup.md | 452 ++++++++++ docs/tutorials/hello-world-app.md | 477 ++++++++++ docs/tutorials/kms-build-configuration.md | 705 +++++++++++++++ docs/tutorials/kms-cvm-deployment.md | 533 +++++++++++ docs/tutorials/local-docker-registry.md | 205 +++++ docs/tutorials/management-interface-setup.md | 156 ++++ docs/tutorials/rust-toolchain-installation.md | 124 +++ docs/tutorials/ssl-certificate-setup.md | 283 ++++++ .../tutorials/system-baseline-dependencies.md | 118 +++ docs/tutorials/tdx-bios-configuration.md | 145 +++ docs/tutorials/tdx-hardware-verification.md | 128 +++ docs/tutorials/tdx-sgx-verification.md | 263 ++++++ docs/tutorials/tdx-software-installation.md | 272 ++++++ .../troubleshooting-dstack-installation.md | 373 ++++++++ .../troubleshooting-first-application.md | 144 +++ .../troubleshooting-gateway-deployment.md | 309 +++++++ docs/tutorials/troubleshooting-host-setup.md | 279 ++++++ .../troubleshooting-kms-deployment.md | 388 ++++++++ .../troubleshooting-prerequisites.md | 475 ++++++++++ docs/tutorials/vmm-configuration.md | 320 +++++++ docs/tutorials/vmm-service-setup.md | 188 ++++ 31 files changed, 10210 insertions(+) create mode 100644 docs/tutorials/attestation-verification.md create mode 100644 docs/tutorials/blockchain-setup.md create mode 100644 docs/tutorials/clone-build-dstack-vmm.md create mode 100644 docs/tutorials/contract-deployment.md create mode 100644 docs/tutorials/dns-configuration.md create mode 100644 docs/tutorials/docker-setup.md create mode 100644 docs/tutorials/gateway-build-configuration.md create mode 100644 docs/tutorials/gateway-service-setup.md create mode 100644 docs/tutorials/gramine-key-provider.md create mode 100644 docs/tutorials/guest-image-setup.md create mode 100644 docs/tutorials/haproxy-setup.md create mode 100644 docs/tutorials/hello-world-app.md create mode 100644 docs/tutorials/kms-build-configuration.md create mode 100644 docs/tutorials/kms-cvm-deployment.md create mode 100644 docs/tutorials/local-docker-registry.md create mode 100644 docs/tutorials/management-interface-setup.md create mode 100644 docs/tutorials/rust-toolchain-installation.md create mode 100644 docs/tutorials/ssl-certificate-setup.md create mode 100644 docs/tutorials/system-baseline-dependencies.md create mode 100644 docs/tutorials/tdx-bios-configuration.md create mode 100644 docs/tutorials/tdx-hardware-verification.md create mode 100644 docs/tutorials/tdx-sgx-verification.md create mode 100644 docs/tutorials/tdx-software-installation.md create mode 100644 docs/tutorials/troubleshooting-dstack-installation.md create mode 100644 docs/tutorials/troubleshooting-first-application.md create mode 100644 docs/tutorials/troubleshooting-gateway-deployment.md create mode 100644 docs/tutorials/troubleshooting-host-setup.md create mode 100644 docs/tutorials/troubleshooting-kms-deployment.md create mode 100644 docs/tutorials/troubleshooting-prerequisites.md create mode 100644 docs/tutorials/vmm-configuration.md create mode 100644 docs/tutorials/vmm-service-setup.md diff --git a/docs/tutorials/attestation-verification.md b/docs/tutorials/attestation-verification.md new file mode 100644 index 00000000..37df949f --- /dev/null +++ b/docs/tutorials/attestation-verification.md @@ -0,0 +1,848 @@ +--- +title: "Attestation Verification" +description: "Verify TDX attestation to prove your application runs in a genuine secure environment" +section: "First Application" +stepNumber: 2 +totalSteps: 2 +lastUpdated: 2026-03-09 +prerequisites: + - hello-world-app +tags: + - dstack + - tdx + - attestation + - ra-tls + - verification + - security +difficulty: "advanced" +estimatedTime: "45 minutes" +--- + +# Attestation Verification + +This tutorial guides you through verifying TDX attestation for your deployed applications. Attestation is the cryptographic proof that your application is genuinely running inside a TDX-protected Confidential Virtual Machine with the expected software stack. + +## What You'll Learn + +- **Retrieving attestation data** - Get measurements and RA-TLS certificates from running CVMs +- **Measurement verification** - Understand and verify MRTD and RTMR values +- **RA-TLS certificates** - Examine X.509 certificates with embedded TDX quotes +- **End-to-end verification** - Complete attestation workflow + +## Why Attestation Matters + +Attestation provides cryptographic proof of three critical properties: + +| Property | What It Proves | +|----------|----------------| +| **Authenticity** | The CVM is running on genuine Intel TDX hardware | +| **Integrity** | The firmware, kernel, and OS haven't been modified | +| **Isolation** | Your application's memory is encrypted and isolated | + +Without attestation, you're trusting the infrastructure provider. With attestation, you have mathematical proof that the security guarantees are being enforced by hardware. + +## Understanding TDX Measurements + +TDX uses several measurement registers to track the boot process: + +``` +┌─────────────────────────────────────────────────────────────┐ +│ TDX Measurement Registers │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ MRTD (Measurement Register TD) │ +│ └── Measures: Virtual firmware (OVMF) │ +│ Computed by: TDX module (hardware) │ +│ Fixed for: Same OVMF binary │ +│ │ +│ RTMR0 (Runtime Measurement Register 0) │ +│ └── Measures: CPU/memory configuration │ +│ Computed by: OVMF during boot │ +│ Varies with: VM specifications (vCPUs, RAM) │ +│ │ +│ RTMR1 (Runtime Measurement Register 1) │ +│ └── Measures: Linux kernel │ +│ Computed by: OVMF when loading kernel │ +│ Fixed for: Same kernel binary (bzImage) │ +│ │ +│ RTMR2 (Runtime Measurement Register 2) │ +│ └── Measures: Kernel cmdline + initramfs │ +│ Computed by: OVMF │ +│ Fixed for: Same image metadata │ +│ │ +│ RTMR3 (Runtime Measurement Register 3) │ +│ └── Measures: Application configuration │ +│ Computed by: Tappd at runtime │ +│ Varies with: Docker compose, app ID, etc. │ +│ │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Understanding RA-TLS + +dstack uses **Remote Attestation TLS (RA-TLS)** to bind TDX attestation to standard TLS certificates. When a CVM boots, tappd generates an X.509 certificate (`app_cert`) that embeds the TDX quote directly in certificate extensions: + +``` +┌─────────────────────────────────────────────────────────────┐ +│ RA-TLS Certificate │ +├─────────────────────────────────────────────────────────────┤ +│ Standard X.509 fields (subject, issuer, validity, etc.) │ +│ │ +│ Custom Extensions: │ +│ ├── OID 1.3.6.1.4.1.62397.1.1 → TDX Quote (binary) │ +│ ├── OID 1.3.6.1.4.1.62397.1.2 → Event Log │ +│ ├── OID 1.3.6.1.4.1.62397.1.3 → App ID / Compose Hash │ +│ └── OID 1.3.6.1.4.1.62397.1.4 → Custom Claims │ +│ │ +│ The TDX quote is signed by Intel TDX hardware and binds │ +│ the certificate's public key to the CVM measurements. │ +└─────────────────────────────────────────────────────────────┘ +``` + +In production, the **application inside the CVM** serves this `app_cert` via TLS. External verifiers connect to the app, receive the RA-TLS certificate, extract the TDX quote from the X.509 extensions, and verify it independently — no host access needed. + +For this tutorial, since our hello-world app (nginx:alpine) doesn't serve RA-TLS directly, we'll use the VMM's `/guest/Info` proxy API to retrieve the attestation data. The concepts are identical to what you'd implement in a production RA-TLS verifier. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [Hello World Application](/tutorial/hello-world-app) +- A running CVM instance +- `jq` and `openssl` installed on the host + +Verify you have a running CVM: + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +## Step 1: Retrieve Attestation Data + +The VMM provides a `/guest/Info` endpoint that proxies into the CVM and retrieves attestation data including measurements and the RA-TLS certificate. + +### Via VMM Guest Proxy + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +# Get the VM UUID for hello-world +VM_UUID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json 2>/dev/null \ + | jq -r '.[] | select(.name=="hello-world") | .id') +echo "VM UUID: $VM_UUID" + +# Retrieve attestation data via guest proxy +curl -s -u "admin:$DSTACK_VMM_AUTH_PASSWORD" \ + -X POST http://127.0.0.1:9080/guest/Info \ + -H "Content-Type: application/json" \ + -d "{\"id\": \"$VM_UUID\"}" | jq '{ + instance_id: .instance_id, + app_id: .app_id, + tcb_info: (.tcb_info | fromjson | {mrtd, rtmr0, rtmr1, rtmr2, rtmr3}) + }' +``` + +You should see output like: + +```json +{ + "instance_id": "hello-world-abc123", + "app_id": "hello-world", + "tcb_info": { + "mrtd": "a3f1b2c4d5e6...", + "rtmr0": "11223344aabb...", + "rtmr1": "55667788ccdd...", + "rtmr2": "99aabbccddee...", + "rtmr3": "ddeeff001122..." + } +} +``` + +> **RA-TLS in production:** In a real deployment, your application would serve the `app_cert` via TLS directly. External verifiers would connect to your app's HTTPS endpoint, receive the RA-TLS certificate, and extract the TDX quote from the X.509 extension at OID `1.3.6.1.4.1.62397.1.1`. No VMM access is needed — the app proves its own integrity. + +### From Inside the CVM + +Applications running inside the CVM can request raw TDX quotes directly via the tappd Unix socket: + +```bash +# This would be run inside a container in the CVM +curl -X POST --unix-socket /var/run/tappd.sock \ + -d '{"report_data": "0x48656c6c6f"}' \ + http://localhost/prpc/Tappd.RawQuote?json +``` + +The `report_data` field is optional user-provided data (up to 64 bytes, hex-encoded) that gets included in the quote. Applications use this for challenge-response attestation — a verifier sends a random nonce, the app includes it in the quote, proving the quote is fresh. + +## Step 2: Understand the Response + +The `/guest/Info` response contains several key fields. Let's examine the full structure: + +```bash +# Save the full response for examination +RESPONSE=$(curl -s -u "admin:$DSTACK_VMM_AUTH_PASSWORD" \ + -X POST http://127.0.0.1:9080/guest/Info \ + -H "Content-Type: application/json" \ + -d "{\"id\": \"$VM_UUID\"}") + +# Show top-level keys +echo "$RESPONSE" | jq 'keys' +``` + +The response includes: + +| Field | Description | +|-------|-------------| +| `instance_id` | Unique identifier for this CVM instance | +| `app_id` | Application identifier (from compose config) | +| `version` | dstack version running in the CVM | +| `app_cert` | RA-TLS certificate (PEM-encoded X.509 with TDX quote in extensions) | +| `tcb_info` | JSON string containing all measurements and the event log | + +### TCB Info Structure + +The `tcb_info` field is a JSON string that must be parsed separately. It contains the core attestation data: + +```bash +echo "$RESPONSE" | jq -r '.tcb_info' | jq . +``` + +| Field | Description | +|-------|-------------| +| `mrtd` | Virtual firmware (OVMF) measurement — set by TDX hardware | +| `rtmr0` | VM configuration measurement (vCPUs, RAM) — set by OVMF | +| `rtmr1` | Kernel measurement — set by OVMF when loading bzImage | +| `rtmr2` | Cmdline/initrd measurement — set by OVMF | +| `rtmr3` | Application runtime measurement — set by tappd | +| `compose_hash` | SHA-256 of the docker compose configuration | +| `os_image_hash` | SHA-256 of the guest OS image | +| `event_log` | Array of detailed events for RTMR3 replay verification | + +## Step 3: Calculate Expected Measurements + +To verify attestation, you need to independently calculate what the measurements **should** be from the guest OS image files. The `dstack-mr` tool does this, but it requires a runtime dependency that must be built first. + +### Get image metadata + +```bash +cat /var/lib/dstack/images/dstack-0.5.7/metadata.json | jq . +``` + +### Build dstack-acpi-tables (required dependency) + +`dstack-mr` internally runs a tool called `dstack-acpi-tables` to generate ACPI tables for RTMR0 calculation. This is a custom-patched QEMU binary compiled with `-DDUMP_ACPI_TABLES`. You need to build it once: + +```bash +# Install QEMU build dependencies +sudo apt-get update +sudo apt-get install -y git libslirp-dev python3-pip ninja-build \ + pkg-config libglib2.0-dev build-essential flex bison + +# Clone the custom QEMU fork +cd ~/dstack +git clone https://github.com/kvinwang/qemu-tdx.git --depth 1 \ + --branch dstack-qemu-9.2.1 --single-branch + +# Configure with ACPI table dumping enabled +cd qemu-tdx +export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct) +export CFLAGS="-DDUMP_ACPI_TABLES -Wno-builtin-macro-redefined -D__DATE__=\"\" -D__TIME__=\"\" -D__TIMESTAMP__=\"\"" +export LDFLAGS="-Wl,--build-id=none" +mkdir build && cd build +../configure --target-list=x86_64-softmmu --disable-werror + +# Build (this takes several minutes) +ninja + +# Install the binary +strip qemu-system-x86_64 +sudo install -m 755 qemu-system-x86_64 /usr/local/bin/dstack-acpi-tables + +# Install required QEMU data files +sudo install -d /usr/local/share/qemu +sudo install -m 644 ../pc-bios/efi-virtio.rom /usr/local/share/qemu/ +sudo install -m 644 ../pc-bios/kvmvapic.bin /usr/local/share/qemu/ +sudo install -m 644 ../pc-bios/linuxboot_dma.bin /usr/local/share/qemu/ + +# Clean up source (optional) +cd ~/dstack +rm -rf qemu-tdx +``` + +### Build the measurement calculator + +```bash +cd ~/dstack +cargo build --release -p dstack-mr-cli +``` + +This produces `./target/release/dstack-mr`. + +### Calculate expected MRs + +The tool uses a `measure` subcommand. The metadata path is a positional argument, and it reads the actual OVMF, kernel, and initrd files from the same directory: + +```bash +./target/release/dstack-mr measure \ + --cpu 2 \ + --memory 2G \ + /var/lib/dstack/images/dstack-0.5.7/metadata.json +``` + +Expected output: + +``` +Machine measurements: +MRTD: a1b2c3d4e5f6789... +RTMR0: 112233445566... +RTMR1: 55667788990011... +RTMR2: 99aabbccddee... +``` + +For JSON output (useful in scripts), add `--json`: + +```bash +./target/release/dstack-mr measure --json \ + --cpu 2 --memory 2G \ + /var/lib/dstack/images/dstack-0.5.7/metadata.json +``` + +> **Note:** RTMR3 is not included — it depends on application configuration and can only be verified via event log replay (see Step 6). + +## Step 4: Verify the RA-TLS Certificate + +The `app_cert` in the `/guest/Info` response is an RA-TLS certificate — a standard X.509 certificate with TDX attestation data embedded in custom extensions. + +### Extract and examine the certificate + +```bash +# Extract the app_cert +echo "$RESPONSE" | jq -r '.app_cert' > /tmp/app_cert.pem + +# View the certificate structure +openssl x509 -in /tmp/app_cert.pem -text -noout +``` + +In the output, look for the **X509v3 extensions** section. You'll see custom extensions under the dstack OID arc (`1.3.6.1.4.1.62397.1.*`): + +### Extension OIDs + +| OID | Content | Description | +|-----|---------|-------------| +| `1.3.6.1.4.1.62397.1.1` | TDX Quote | Binary TDX quote signed by Intel hardware. Contains all measurement registers and binds the cert's public key to the measurements. | +| `1.3.6.1.4.1.62397.1.2` | Event Log | Detailed event log for RTMR3 replay verification | +| `1.3.6.1.4.1.62397.1.3` | App ID / Compose Hash | Application identity and configuration hash | +| `1.3.6.1.4.1.62397.1.4` | Custom Claims | Optional application-defined claims | + +### Verify the certificate chain + +The app_cert is signed by the dstack App CA, which is in turn signed by the dstack KMS CA: + +``` +app_cert → Dstack App CA → Dstack KMS CA +``` + +The KMS CA is established during KMS deployment (Phase 4). The chain proves that this certificate was issued by a KMS that verified the CVM's TDX measurements before issuing the cert. + +```bash +# Show issuer information +openssl x509 -in /tmp/app_cert.pem -issuer -noout +``` + +### Why RA-TLS works + +The TDX quote embedded at OID `1.3.6.1.4.1.62397.1.1` was generated by Intel TDX hardware during CVM boot. It contains: + +1. **All measurement registers** (MRTD, RTMR0-3) — proving what software is running +2. **A hash of the certificate's public key** in the `report_data` field — binding the cert to the hardware attestation +3. **Intel's hardware signature** — proving the quote came from genuine TDX hardware + +This means: if you trust the certificate (verified via the chain), you trust the measurements, which means you know exactly what code is running inside the CVM. + +## Step 5: Compare Measurements + +Compare the CVM's actual measurements against your expected values: + +```bash +#!/bin/bash +# verify-measurements.sh + +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +# Get VM UUID +VM_UUID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json 2>/dev/null \ + | jq -r '.[] | select(.name=="hello-world") | .id') + +if [ -z "$VM_UUID" ] || [ "$VM_UUID" = "null" ]; then + echo "Error: hello-world CVM not found. Is it running?" + exit 1 +fi + +# Fetch attestation data +RESPONSE=$(curl -s -u "admin:$DSTACK_VMM_AUTH_PASSWORD" \ + -X POST http://127.0.0.1:9080/guest/Info \ + -H "Content-Type: application/json" \ + -d "{\"id\": \"$VM_UUID\"}") + +# Parse tcb_info (it's a JSON string inside JSON) +TCB_INFO=$(echo "$RESPONSE" | jq -r '.tcb_info') + +# Extract measurements +MRTD=$(echo "$TCB_INFO" | jq -r '.mrtd') +RTMR0=$(echo "$TCB_INFO" | jq -r '.rtmr0') +RTMR1=$(echo "$TCB_INFO" | jq -r '.rtmr1') +RTMR2=$(echo "$TCB_INFO" | jq -r '.rtmr2') +RTMR3=$(echo "$TCB_INFO" | jq -r '.rtmr3') + +echo "Actual Measurements from CVM:" +echo " MRTD: $MRTD" +echo " RTMR0: $RTMR0" +echo " RTMR1: $RTMR1" +echo " RTMR2: $RTMR2" +echo " RTMR3: $RTMR3" +echo "" + +# Expected values — replace these with your dstack-mr output +EXPECTED_MRTD="" +EXPECTED_RTMR0="" +EXPECTED_RTMR1="" +EXPECTED_RTMR2="" + +echo "Measurement Verification Results:" +echo "==================================" + +if [ "$EXPECTED_MRTD" = "" ]; then + echo "Warning: Using placeholder expected values." + echo "Run dstack-mr first, then update the EXPECTED_* variables." + exit 0 +fi + +if [ "$MRTD" = "$EXPECTED_MRTD" ]; then + echo " MRTD - MATCH - Firmware verified" +else + echo " MRTD - MISMATCH" + echo " Expected: $EXPECTED_MRTD" + echo " Got: $MRTD" +fi + +if [ "$RTMR0" = "$EXPECTED_RTMR0" ]; then + echo " RTMR0 - MATCH - VM config verified" +else + echo " RTMR0 - MISMATCH" + echo " Expected: $EXPECTED_RTMR0" + echo " Got: $RTMR0" +fi + +if [ "$RTMR1" = "$EXPECTED_RTMR1" ]; then + echo " RTMR1 - MATCH - Kernel verified" +else + echo " RTMR1 - MISMATCH" + echo " Expected: $EXPECTED_RTMR1" + echo " Got: $RTMR1" +fi + +if [ "$RTMR2" = "$EXPECTED_RTMR2" ]; then + echo " RTMR2 - MATCH - Initrd verified" +else + echo " RTMR2 - MISMATCH" + echo " Expected: $EXPECTED_RTMR2" + echo " Got: $RTMR2" +fi + +echo "" +echo "RTMR3 requires event log replay (see Step 6)" +``` + +## Step 6: Verify RTMR3 via Event Log + +RTMR3 contains runtime measurements that can't be pre-calculated — they depend on the application configuration, instance ID, and other runtime values. Instead, verify by examining the event log. + +### View the event log + +```bash +# Extract and display the event log from tcb_info +TCB_INFO=$(echo "$RESPONSE" | jq -r '.tcb_info') +echo "$TCB_INFO" | jq '.event_log' +``` + +Each event in the log has these fields: + +| Field | Description | +|-------|-------------| +| `imr` | Which measurement register was extended (3 = RTMR3) | +| `event_type` | Type of event | +| `digest` | SHA-384 hash that was extended into the register | +| `event` | Human-readable event name | +| `event_payload` | Hex-encoded payload data | + +### Decode and verify known events + +The event log records everything that was measured into RTMR3 during boot: + +```bash +# Display events in human-readable format +echo "$TCB_INFO" | jq -r '.event_log[] | "\(.event): \(.event_payload)"' | while read line; do + EVENT_NAME=$(echo "$line" | cut -d: -f1) + PAYLOAD_HEX=$(echo "$line" | cut -d: -f2- | tr -d ' ') + + # Decode hex payload to text (where applicable) + DECODED=$(echo "$PAYLOAD_HEX" | xxd -r -p 2>/dev/null || echo "(binary)") + + echo " $EVENT_NAME: $DECODED" +done +``` + +### Expected RTMR3 events + +These are the standard events you'll see in the log: + +| Event | Description | What to verify | +|-------|-------------|----------------| +| `system-preparing` | System initialization marker | Always present | +| `app-id` | Application identifier | Should match your app name | +| `compose-hash` | SHA-256 of docker compose config | Should match `tcb_info.compose_hash` | +| `instance-id` | Unique instance identifier | Should match `instance_id` from response | +| `boot-mr-done` | Boot measurements complete | Marker event | +| `mr-kms` | KMS identity measurement | KMS public key hash | +| `os-image-hash` | Guest OS image hash | Should match `tcb_info.os_image_hash` | +| `key-provider` | Key provider type | e.g., `kms` | +| `storage-fs` | Storage filesystem type | Storage configuration | +| `system-ready` | System ready marker | Always present at end | + +### Verify specific event values + +```bash +# Extract compose-hash from event log and compare to tcb_info +EVENT_COMPOSE_HASH=$(echo "$TCB_INFO" | jq -r '.event_log[] | select(.event=="compose-hash") | .event_payload' | xxd -r -p 2>/dev/null) +TCB_COMPOSE_HASH=$(echo "$TCB_INFO" | jq -r '.compose_hash') + +echo "Event log compose-hash: $EVENT_COMPOSE_HASH" +echo "TCB info compose_hash: $TCB_COMPOSE_HASH" + +# Extract os-image-hash from event log +EVENT_OS_HASH=$(echo "$TCB_INFO" | jq -r '.event_log[] | select(.event=="os-image-hash") | .event_payload' | xxd -r -p 2>/dev/null) +TCB_OS_HASH=$(echo "$TCB_INFO" | jq -r '.os_image_hash') + +echo "Event log os-image-hash: $EVENT_OS_HASH" +echo "TCB info os_image_hash: $TCB_OS_HASH" +``` + +## Step 7: Full Verification Script + +Here's a complete end-to-end verification script: + +```bash +#!/bin/bash +# full-attestation-verify.sh +# +# Complete attestation verification for a dstack CVM. +# Retrieves measurements, examines the RA-TLS certificate, +# and verifies the event log. + +INSTANCE_NAME="${1:-hello-world}" +IMAGE_VERSION="${2:-dstack-0.5.7}" + +echo "=========================================" +echo "dstack Attestation Verification" +echo "=========================================" +echo "Instance: $INSTANCE_NAME" +echo "Image: $IMAGE_VERSION" +echo "" + +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +# --- Step 1: Get VM UUID --- +echo "Step 1: Locating CVM..." +VM_UUID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json 2>/dev/null \ + | jq -r ".[] | select(.name==\"$INSTANCE_NAME\") | .id") + +if [ -z "$VM_UUID" ] || [ "$VM_UUID" = "null" ]; then + echo " FAIL - CVM '$INSTANCE_NAME' not found. Is it running?" + echo " Run: ./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm" + exit 1 +fi +echo " OK - VM UUID: $VM_UUID" + +# --- Step 2: Fetch attestation data --- +echo "" +echo "Step 2: Fetching attestation data via /guest/Info..." +RESPONSE=$(curl -s -u "admin:$DSTACK_VMM_AUTH_PASSWORD" \ + -X POST http://127.0.0.1:9080/guest/Info \ + -H "Content-Type: application/json" \ + -d "{\"id\": \"$VM_UUID\"}") + +if [ -z "$RESPONSE" ] || [ "$(echo "$RESPONSE" | jq -r '.tcb_info // empty')" = "" ]; then + echo " FAIL - No attestation data returned. CVM may still be booting." + exit 1 +fi + +INSTANCE_ID=$(echo "$RESPONSE" | jq -r '.instance_id') +APP_ID=$(echo "$RESPONSE" | jq -r '.app_id') +echo " OK - Instance: $INSTANCE_ID" +echo " OK - App ID: $APP_ID" + +# --- Step 3: Extract measurements --- +echo "" +echo "Step 3: Extracting measurements from tcb_info..." +TCB_INFO=$(echo "$RESPONSE" | jq -r '.tcb_info') + +MRTD=$(echo "$TCB_INFO" | jq -r '.mrtd') +RTMR0=$(echo "$TCB_INFO" | jq -r '.rtmr0') +RTMR1=$(echo "$TCB_INFO" | jq -r '.rtmr1') +RTMR2=$(echo "$TCB_INFO" | jq -r '.rtmr2') +RTMR3=$(echo "$TCB_INFO" | jq -r '.rtmr3') + +echo " MRTD: ${MRTD:0:32}..." +echo " RTMR0: ${RTMR0:0:32}..." +echo " RTMR1: ${RTMR1:0:32}..." +echo " RTMR2: ${RTMR2:0:32}..." +echo " RTMR3: ${RTMR3:0:32}..." + +# --- Step 4: Save and examine RA-TLS certificate --- +echo "" +echo "Step 4: Examining RA-TLS certificate..." +APP_CERT=$(echo "$RESPONSE" | jq -r '.app_cert') + +if [ -n "$APP_CERT" ] && [ "$APP_CERT" != "null" ]; then + echo "$APP_CERT" > /tmp/app_cert.pem + + # Show certificate subject and issuer + SUBJECT=$(openssl x509 -in /tmp/app_cert.pem -subject -noout 2>/dev/null) + ISSUER=$(openssl x509 -in /tmp/app_cert.pem -issuer -noout 2>/dev/null) + echo " $SUBJECT" + echo " $ISSUER" + + # Check for RA-TLS extensions + CERT_TEXT=$(openssl x509 -in /tmp/app_cert.pem -text -noout 2>/dev/null) + if echo "$CERT_TEXT" | grep -q "1.3.6.1.4.1.62397"; then + echo " OK - RA-TLS extensions found (OID 1.3.6.1.4.1.62397.1.*)" + echo " .1.1 = TDX Quote | .1.2 = Event Log" + echo " .1.3 = App ID | .1.4 = Custom Claims" + else + echo " WARN - RA-TLS extensions not found in certificate" + fi + + echo " OK - Certificate saved to /tmp/app_cert.pem" +else + echo " WARN - No app_cert in response" +fi + +# --- Step 5: Compare measurements (if dstack-mr available) --- +echo "" +echo "Step 5: Measurement comparison..." +DSTACK_MR="$HOME/dstack/target/release/dstack-mr" +METADATA="/var/lib/dstack/images/$IMAGE_VERSION/metadata.json" + +if [ -x "$DSTACK_MR" ] && [ -f "$METADATA" ]; then + echo " Calculating expected measurements with dstack-mr..." + EXPECTED=$($DSTACK_MR measure --cpu 2 --memory 2G "$METADATA" 2>/dev/null) + + EXPECTED_MRTD=$(echo "$EXPECTED" | grep "MRTD:" | awk '{print $2}') + EXPECTED_RTMR0=$(echo "$EXPECTED" | grep "RTMR0:" | awk '{print $2}') + EXPECTED_RTMR1=$(echo "$EXPECTED" | grep "RTMR1:" | awk '{print $2}') + EXPECTED_RTMR2=$(echo "$EXPECTED" | grep "RTMR2:" | awk '{print $2}') + + for REG in MRTD RTMR0 RTMR1 RTMR2; do + ACTUAL_VAR="${REG}" + EXPECTED_VAR="EXPECTED_${REG}" + ACTUAL="${!ACTUAL_VAR}" + EXPECTED_VAL="${!EXPECTED_VAR}" + + if [ -n "$EXPECTED_VAL" ] && [ "$ACTUAL" = "$EXPECTED_VAL" ]; then + echo " $REG - MATCH" + elif [ -n "$EXPECTED_VAL" ]; then + echo " $REG - MISMATCH" + echo " Expected: ${EXPECTED_VAL:0:32}..." + echo " Got: ${ACTUAL:0:32}..." + fi + done +else + echo " SKIP - dstack-mr not built or metadata not found" + echo " To enable: cd ~/dstack && cargo build --release -p dstack-mr-cli" +fi + +# --- Step 6: Display event log --- +echo "" +echo "Step 6: Event log (RTMR3 events)..." +EVENT_COUNT=$(echo "$TCB_INFO" | jq '.event_log | length') +echo " $EVENT_COUNT events recorded:" + +echo "$TCB_INFO" | jq -r '.event_log[] | .event' | while read EVENT_NAME; do + echo " - $EVENT_NAME" +done + +# Verify compose-hash consistency +COMPOSE_HASH=$(echo "$TCB_INFO" | jq -r '.compose_hash // empty') +OS_IMAGE_HASH=$(echo "$TCB_INFO" | jq -r '.os_image_hash // empty') + +if [ -n "$COMPOSE_HASH" ]; then + echo "" + echo " Compose hash: ${COMPOSE_HASH:0:32}..." +fi +if [ -n "$OS_IMAGE_HASH" ]; then + echo " OS image hash: ${OS_IMAGE_HASH:0:32}..." +fi + +# --- Summary --- +echo "" +echo "=========================================" +echo "Verification Summary" +echo "=========================================" +echo " CVM instance: $INSTANCE_NAME ($INSTANCE_ID)" +echo " App ID: $APP_ID" +echo " Measurements: All 5 registers retrieved (MRTD, RTMR0-3)" +if [ -n "$APP_CERT" ] && [ "$APP_CERT" != "null" ]; then + echo " RA-TLS cert: Present (saved to /tmp/app_cert.pem)" +fi +echo " Event log: $EVENT_COUNT events" +echo "" +echo " Next steps:" +echo " - Compare MRTD/RTMR0-2 with dstack-mr output" +echo " - Verify RTMR3 by reviewing event log entries" +echo " - In production, verify app_cert chain and TDX quote signature" +echo "=========================================" +``` + +Make it executable and run: + +```bash +chmod +x full-attestation-verify.sh +./full-attestation-verify.sh hello-world dstack-0.5.7 +``` + +## Best Practices for Production + +### 1. Use RA-TLS for application-level attestation + +In production, don't rely on host API access. Instead, have your application serve the RA-TLS certificate via TLS: + +```python +# Pseudo-code for an RA-TLS verifier +def verify_cvm_app(hostname, port): + # Connect and get the server's certificate + cert = ssl_connect_and_get_cert(hostname, port) + + # Extract TDX quote from X.509 extension + quote = extract_extension(cert, oid="1.3.6.1.4.1.62397.1.1") + + # Verify quote signature (Intel hardware attestation) + if not verify_tdx_quote(quote): + raise SecurityError("TDX quote verification failed") + + # Extract measurements from quote + measurements = parse_quote_measurements(quote) + + # Compare against expected values + if measurements.mrtd != expected_mrtd: + raise SecurityError("Firmware measurement mismatch") + + return True # CVM is genuine and running expected software +``` + +### 2. Include report_data for freshness + +When applications call tappd internally to generate quotes, include fresh random data to prevent replay attacks: + +```bash +# Inside the CVM — application generates a fresh quote with a nonce +NONCE=$(openssl rand -hex 32) +curl -X POST --unix-socket /var/run/tappd.sock \ + -d "{\"report_data\": \"0x$NONCE\"}" \ + http://localhost/prpc/Tappd.RawQuote?json +``` + +The verifier sends the nonce, the app includes it in the quote, and the verifier checks it matches — proving the quote was generated just now, not replayed. + +### 3. Verify the complete measurement chain + +Don't just check one register — verify the complete chain: + +``` +MRTD → RTMR0 → RTMR1 → RTMR2 → RTMR3 + │ │ │ │ │ + v v v v v +OVMF VM Config Kernel Initrd App +``` + +Each register builds on the previous one. A compromised kernel (RTMR1) could fake application measurements (RTMR3), so always verify from the firmware up. + +### 4. Keep expected measurements updated + +When you update guest OS images, recalculate expected measurements: + +```bash +# After updating to new image version +dstack-mr measure --cpu 2 --memory 2G \ + /var/lib/dstack/images/dstack-0.5.7/metadata.json +``` + +### 5. Use reproducible builds + +For highest assurance, build images from source: + +```bash +git clone https://github.com/Dstack-TEE/meta-dstack.git +cd meta-dstack/repro-build +./repro-build.sh -n # Reproducible build +``` + +This ensures you know exactly what code is in the image, and anyone can independently verify the measurements match. + +## Troubleshooting + +For detailed solutions, see the [First Application Troubleshooting Guide](/tutorial/troubleshooting-first-application#attestation-verification-issues): + +- [Attestation data retrieval fails](/tutorial/troubleshooting-first-application#attestation-data-retrieval-fails) +- [Measurements don't match](/tutorial/troubleshooting-first-application#measurements-dont-match) +- [RA-TLS certificate issues](/tutorial/troubleshooting-first-application#ra-tls-certificate-issues) + +## Verification Checklist + +Before proceeding, verify you have: + +- [ ] Successfully retrieved attestation data via `/guest/Info` +- [ ] Extracted all measurement registers (MRTD, RTMR0-3) +- [ ] Examined the RA-TLS certificate and its extensions +- [ ] Understood the event log contents +- [ ] Know how to calculate expected measurements with `dstack-mr` +- [ ] Automated verification with the full script + +## Phase 5 Complete! + +Congratulations! You have completed Phase 5 (First Application Deployment): + +1. **Guest OS Image Setup** - Downloaded and configured guest images +2. **Hello World Application** - Deployed your first CVM application +3. **Attestation Verification** - Proved your app runs in a secure environment + +## What You've Accomplished + +Your dstack deployment now includes: + +- TDX-enabled host with hardware security +- VMM service managing CVMs and virtual machines +- KMS service providing key management +- Gateway service routing traffic +- A running Hello World application +- Cryptographic proof of security via attestation + +## Next Steps + +With the foundation complete, you're ready to explore: + +- **Phase 6:** Deploy more complex applications from dstack-examples +- **Advanced attestation:** ConfigID and RTMR3-based verification +- **Custom domains:** Access apps via your own domain +- **SSH access:** Connect directly to CVMs +- **Port forwarding:** Expose additional services + +## Additional Resources + +- [Intel TDX Documentation](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html) +- [DCAP Attestation Guide](https://download.01.org/intel-sgx/latest/dcap-latest/linux/docs/) +- [dstack Attestation Source](https://github.com/Dstack-TEE/dstack/tree/main/attestation) +- [Reproducible Builds for meta-dstack](https://github.com/Dstack-TEE/meta-dstack/tree/main/repro-build) diff --git a/docs/tutorials/blockchain-setup.md b/docs/tutorials/blockchain-setup.md new file mode 100644 index 00000000..f39ecb38 --- /dev/null +++ b/docs/tutorials/blockchain-setup.md @@ -0,0 +1,292 @@ +--- +title: "Blockchain Wallet Setup" +description: "Set up an Ethereum wallet and fund it with testnet ETH for dstack deployment" +section: "Prerequisites" +stepNumber: 5 +totalSteps: 7 +lastUpdated: 2026-01-09 +prerequisites: [] +tags: + - blockchain + - ethereum + - wallet + - testnet + - sepolia +difficulty: intermediate +estimatedTime: 15-20 minutes +--- + +# Blockchain Wallet Setup + +dstack's Key Management Service (KMS) is deployed as a smart contract on the Ethereum blockchain. For tutorial purposes, we'll use the **Sepolia testnet**, which allows you to deploy and test without spending real ETH. + +## What You'll Need + +- **Ethereum wallet** with a private key +- **Testnet ETH** (~0.1 ETH minimum for deployment) + +## Why Sepolia? + +Sepolia is one of Ethereum's official testnets: + +- Free testnet ETH from faucets +- Similar to mainnet but without real value +- Perfect for development and testing +- Widely supported by tools and services + +--- + +## Option 1: Command-Line Wallet (Recommended) + +If you have [Foundry](https://book.getfoundry.sh/getting-started/installation) installed, you can create a wallet using the `cast` command. + +### Step 1.1: Check if Foundry is Installed + +```bash +cast --version +``` + +If not installed, see: https://book.getfoundry.sh/getting-started/installation + +### Step 1.2: Generate New Wallet + +```bash +cast wallet new +``` + +**Example output:** + +``` +Successfully created new keypair. +Address: 0x91Ba69FCD13D2876FD06907a2880BDBC93C336aF +Private key: 0xd76e8d3059484d5d9167c4e10cfeea2a4efa655875112e693e18fb4ab890b98a +``` + +⚠️ **Save these immediately:** + +- **Address:** Your public wallet address (safe to share) +- **Private Key:** SECRET - never share or commit to git + +### Step 1.3: Store Wallet Credentials Securely + +Create secure files to store your wallet address and private key: + +```bash +# Create secure directory +mkdir -p ~/.dstack/secrets +chmod 700 ~/.dstack/secrets + +# Store wallet address (replace with your address) +echo "0xYOUR_ADDRESS_HERE" > ~/.dstack/secrets/sepolia-address +chmod 600 ~/.dstack/secrets/sepolia-address + +# Store private key (replace with your key) +echo "0xYOUR_PRIVATE_KEY_HERE" > ~/.dstack/secrets/sepolia-private-key +chmod 600 ~/.dstack/secrets/sepolia-private-key +``` + +⚠️ **IMPORTANT:** Add to `.gitignore` if working in a git repository: + +```bash +echo "~/.dstack/secrets/" >> ~/.gitignore +``` + +### Step 1.4: Check Wallet Balance + +```bash +# Quick check using public Sepolia RPC +cast balance "$(cat ~/.dstack/secrets/sepolia-address)" --rpc-url https://ethereum-sepolia-rpc.publicnode.com +``` + +Expected output for new wallet: `0` (zero) + +--- + +## Option 2: MetaMask Wallet + +If you prefer a browser-based wallet, MetaMask is the most popular choice. + +### Step 2.1: Install MetaMask + +1. Visit: https://metamask.io/ +2. Install browser extension (Chrome, Firefox, Brave, Edge) +3. Create new wallet or import existing one +4. **Save your seed phrase securely** (12 or 24 words) + +### Step 2.2: Add Sepolia Network + +1. Open MetaMask +2. Click network dropdown (top center) +3. Click "Add Network" or "Add a network manually" +4. Enter Sepolia network details: + +``` +Network Name: Sepolia +RPC URL: https://ethereum-sepolia-rpc.publicnode.com +Chain ID: 11155111 +Currency Symbol: ETH +Block Explorer: https://sepolia.etherscan.io +``` + +5. Click "Save" + +### Step 2.3: Get Your Wallet Address + +1. Open MetaMask +2. Click on account name (top center) +3. Address shown below name (starts with `0x...`) +4. Click to copy + +### Step 2.4: Export Private Key (for dstack CLI) + +⚠️ **Only do this if you need the private key for programmatic access** + +1. Open MetaMask +2. Click three dots (top right) → Account Details +3. Click "Export Private Key" +4. Enter MetaMask password +5. Click to reveal and copy private key +6. Store securely as shown in Step 1.3 + +--- + +## Step 2: Get Testnet ETH + +You need testnet ETH to deploy the KMS smart contract. + +### PoW Faucet (Recommended) + +**Best option for new wallets** - no requirements: + +1. Visit: https://sepolia-faucet.pk910.de/ +2. Enter your wallet address +3. Click "Start Mining" +4. Wait 10-30 minutes while mining runs in your browser +5. Claim your testnet ETH (typically 0.05-0.1 ETH per session) + +✅ **Why this faucet?** + +- No mainnet ETH balance required +- No account signup needed +- No MetaMask required +- Works for brand new wallets +- Just needs patience for mining + +### MetaMask Faucet + +If you're using MetaMask: + +- URL: https://docs.metamask.io/developer-tools/faucet +- ❌ **Requires:** MetaMask extension installed + +### More Faucet Options + +For a comprehensive list of Sepolia faucets with their specific requirements, see: +**https://faucetlink.to/sepolia** + +This page lists all available faucets and their requirements (mainnet ETH balance, account signup, etc.) + +### Verify You Received ETH + +**Command Line:** + +```bash +# Quick check using public Sepolia RPC +cast balance "$(cat ~/.dstack/secrets/sepolia-address)" --rpc-url https://ethereum-sepolia-rpc.publicnode.com +``` + +Expected: Non-zero value (e.g., `50000000000000000` = 0.05 ETH, `100000000000000000` = 0.1 ETH in wei) + +**MetaMask:** + +- Switch to Sepolia network +- Check balance shown in extension + +**Block Explorer:** + +```bash +# Open in browser +open "https://sepolia.etherscan.io/address/$(cat ~/.dstack/secrets/sepolia-address)" + +# Or manually visit with your address +https://sepolia.etherscan.io/address/YOUR_ADDRESS +``` + +--- + +## Step 3: Verify Your Secrets + +Check that all required secrets are stored: + +```bash +# List your secrets +ls -la ~/.dstack/secrets/ +``` + +You should have: +- `sepolia-address` - Your wallet address +- `sepolia-private-key` - Your wallet private key + +**Test your configuration:** + +```bash +echo "Wallet: $(cat ~/.dstack/secrets/sepolia-address)" +echo "Balance: $(cast balance "$(cat ~/.dstack/secrets/sepolia-address)" --rpc-url https://ethereum-sepolia-rpc.publicnode.com)" +``` + +--- + +## Verification Checklist + +Before proceeding to KMS deployment, verify: + +- ✅ Wallet created and address saved to `~/.dstack/secrets/sepolia-address` +- ✅ Private key stored securely in `~/.dstack/secrets/sepolia-private-key` +- ✅ Wallet has ≥0.1 testnet ETH +- ✅ Can query balance via cast + +--- + +## Troubleshooting + +For detailed solutions, see the [Prerequisites Troubleshooting Guide](/tutorial/troubleshooting-prerequisites#blockchain-wallet-setup-issues): + +- [Faucet not sending ETH](/tutorial/troubleshooting-prerequisites#problem-faucet-not-sending-eth) +- [RPC endpoint timing out](/tutorial/troubleshooting-prerequisites#problem-rpc-endpoint-timing-out) +- ["Connection refused" error](/tutorial/troubleshooting-prerequisites#problem-connection-refused-error) +- [Can't see balance in cast](/tutorial/troubleshooting-prerequisites#problem-cant-see-balance-in-cast) + +--- + +## Security Best Practices + +### DO: + +✅ **Follow these practices:** + +- Store private keys in encrypted files with restricted permissions (chmod 600) +- Use environment variables for sensitive data +- Keep separate wallets for testnet and mainnet +- Back up your wallet securely (encrypted USB, password manager) +- Use hardware wallet for mainnet production deployments + +### DON'T: + +❌ **Avoid these mistakes:** + +- Commit private keys to git repositories +- Share private keys via email, chat, or screenshots +- Use testnet wallet for mainnet (always use separate wallets) +- Store private keys in plain text on cloud storage +- Reuse private keys across projects + +--- + +## Next Steps + +Once your wallet is set up and funded, you can proceed to: + +1. **Host Setup:** [TDX Hardware Verification](/tutorial/tdx-hardware-verification) - Begin configuring your TDX-capable server +2. **Skip Ahead:** If you already have a TDX-enabled server, you'll use this wallet in the KMS deployment phase + +Your blockchain wallet is ready for dstack KMS deployment! 🎉 diff --git a/docs/tutorials/clone-build-dstack-vmm.md b/docs/tutorials/clone-build-dstack-vmm.md new file mode 100644 index 00000000..1e731e8e --- /dev/null +++ b/docs/tutorials/clone-build-dstack-vmm.md @@ -0,0 +1,157 @@ +--- +title: "Clone & Build dstack-vmm" +description: "Clone the dstack repository and build the Virtual Machine Monitor (VMM) component" +section: "dstack Installation" +stepNumber: 3 +totalSteps: 8 +lastUpdated: 2025-12-07 +prerequisites: + - rust-toolchain-installation +tags: + - dstack + - vmm + - cargo + - build + - compilation +difficulty: "intermediate" +estimatedTime: "20 minutes" +--- + +# Clone & Build dstack-vmm + +This tutorial guides you through cloning the dstack repository and building the Virtual Machine Monitor (VMM) component. The VMM is the core component that manages TEE virtual machines on your host system. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [Rust Toolchain Installation](/tutorial/rust-toolchain-installation) +- SSH access to your TDX-enabled server +- At least 2GB free disk space + + +## What Gets Built + +| Binary | Purpose | +|--------|---------| +| `dstack-vmm` | Virtual Machine Monitor - manages TDX-protected VMs | +| `dstack-supervisor` | Process supervisor - manages processes within VMs | + +Both binaries are installed to `/usr/local/bin/` for system-wide access. + +--- + +## Manual Build + +If you prefer to build manually, follow these steps. + +### Step 1: Connect to Your Server + +```bash +ssh ubuntu@YOUR_SERVER_IP +``` + +All build commands should be run as the `ubuntu` user. Only the final installation step requires `sudo`. + +### Step 2: Verify dstack Repository + +The dstack repository should already be cloned and checked out at v0.5.7 from [Gramine Key Provider](/tutorial/gramine-key-provider): + +```bash +cd ~/dstack +git describe --tags +# Should show v0.5.7 +``` + +### Step 3: Build dstack-vmm + +```bash +cd ~/dstack/vmm +cargo build --release +``` + +### Step 5: Build dstack-supervisor + +```bash +cd ~/dstack +cargo build --release -p supervisor +``` + +### Step 6: Install Binaries + +```bash +# Install VMM +sudo cp ~/dstack/target/release/dstack-vmm /usr/local/bin/dstack-vmm +sudo chmod 755 /usr/local/bin/dstack-vmm + +# Install supervisor +sudo cp ~/dstack/target/release/supervisor /usr/local/bin/dstack-supervisor +sudo chmod 755 /usr/local/bin/dstack-supervisor +``` + +### Step 7: Verify Installation + +```bash +which dstack-vmm +dstack-vmm --version + +which dstack-supervisor +ls -la /usr/local/bin/dstack-supervisor +``` + +--- + +## Build Options + +### Specify a Different Version + +```bash +# Check out a specific version +git checkout v0.5.4 + +# Or use main branch for latest development +git checkout main +git pull +``` + +### Clean Build + +To rebuild from scratch: + +```bash +cargo clean +cargo build --release +``` + +### Debug Build + +For development with better error messages: + +```bash +cargo build +# Binary at ~/dstack/target/debug/dstack-vmm +``` + +--- + +## Troubleshooting + +For detailed solutions, see the [dstack Installation Troubleshooting Guide](/tutorial/troubleshooting-dstack-installation#clone--build-dstack-vmm-issues): + +- [Network timeout downloading crates](/tutorial/troubleshooting-dstack-installation#network-timeout-downloading-crates) +- [Linker errors](/tutorial/troubleshooting-dstack-installation#linker-errors) +- [Permission denied on install](/tutorial/troubleshooting-dstack-installation#permission-denied-on-install) +- [Build cache issues](/tutorial/troubleshooting-dstack-installation#build-cache-issues) + +--- + +## Next Steps + +With dstack-vmm built, proceed to: + +- [VMM Configuration](/tutorial/vmm-configuration) - Configure the VMM for production + +## Additional Resources + +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) +- [Cargo Documentation](https://doc.rust-lang.org/cargo/) diff --git a/docs/tutorials/contract-deployment.md b/docs/tutorials/contract-deployment.md new file mode 100644 index 00000000..3c944f14 --- /dev/null +++ b/docs/tutorials/contract-deployment.md @@ -0,0 +1,256 @@ +--- +title: "Contract Deployment" +description: "Deploy dstack KMS smart contracts to Sepolia testnet from your local machine" +section: "KMS Deployment" +stepNumber: 1 +totalSteps: 3 +lastUpdated: 2026-01-09 +prerequisites: + - blockchain-setup +tags: + - dstack + - kms + - ethereum + - sepolia + - hardhat + - deployment +difficulty: "intermediate" +estimatedTime: "15 minutes" +--- + +# Contract Deployment + +This tutorial deploys the dstack KMS smart contracts to the Sepolia testnet. Contracts are deployed from your **local machine** - your private key never leaves your computer. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [Blockchain Wallet Setup](/tutorial/blockchain-setup) with: + - Wallet private key stored in `~/.dstack/secrets/sepolia-private-key` + - Sepolia testnet ETH (~0.01 ETH recommended) +- dstack repository cloned locally at v0.5.7: `git clone -b v0.5.7 https://github.com/Dstack-TEE/dstack ~/dstack` +## What Gets Deployed + +The deployment creates two smart contracts on Sepolia: + +| Contract | Purpose | +|----------|---------| +| **DstackKms Proxy** | Main entry point - manages KMS settings and app authorization | +| **DstackApp Implementation** | Logic template for application contracts | + +These contracts use the UUPS (Universal Upgradeable Proxy Standard) pattern for future upgrades. + +--- + +## Deployment + +> **Important: Run these steps on your LOCAL machine, not on the TDX server.** Contract deployment requires your Ethereum private key. By running locally, your private key never touches the server. You need a clone of the dstack repo on your local machine: `git clone -b v0.5.7 https://github.com/Dstack-TEE/dstack ~/dstack` + +### Step 1: Clone Repository and Navigate to auth-eth + +On your **local machine**, clone the dstack repository (if you haven't already) and check out v0.5.7: + +```bash +git clone https://github.com/Dstack-TEE/dstack.git ~/dstack 2>/dev/null || true +cd ~/dstack +git checkout v0.5.7 +cd kms/auth-eth +``` + +### Step 2: Install Node.js and Dependencies + +Install nvm (Node Version Manager), then use it to install the correct Node.js version: + +```bash +# Install nvm +curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash + +# Load nvm into current shell +export NVM_DIR="$HOME/.nvm" +[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" + +# Install and use Node.js 18 (LTS) +nvm install 18 +nvm use 18 + +# Verify versions +node --version # Should show v18.x.x +npm --version # Should show 9.x.x or 10.x.x +``` + +Then install the project dependencies: + +```bash +npm install +``` + +### Step 3: Load Credentials + +Load your wallet private key and set the RPC URL: + +```bash +# Load wallet private key +export PRIVATE_KEY=$(cat ~/.dstack/secrets/sepolia-private-key) + +# Set RPC URL for Sepolia testnet +export RPC_URL="https://ethereum-sepolia-rpc.publicnode.com" +``` + +Verify the private key loaded correctly: + +```bash +echo "Private key loaded: ${PRIVATE_KEY:0:6}...${PRIVATE_KEY: -4}" +``` + +### Step 4: Check Wallet Balance + +```bash +# Check balance using cast +cast balance "$(cat ~/.dstack/secrets/sepolia-address)" --rpc-url $RPC_URL +``` + +You need at least 0.01 ETH (shown in wei: `10000000000000000`). If insufficient, get free Sepolia ETH from: +- [PoW Faucet](https://sepolia-faucet.pk910.de/) (no requirements) +- [Faucet List](https://faucetlink.to/sepolia) (more options) + +### Step 5: Compile Contracts + +```bash +npx hardhat compile +``` + +Expected output (compiler version and file count may vary): + +``` +Downloading compiler 0.8.22 +Generating typings for: 19 artifacts in dir: typechain-types for target: ethers-v6 +Successfully generated 72 typings! +Compiled 19 Solidity files successfully (evm target: paris). +``` + +This generates the contract artifacts (ABI and bytecode) needed for deployment. + +### Step 6: Deploy Contracts + +```bash +npx hardhat kms:deploy --with-app-impl --network custom +``` + +Expected output: + +``` +Deploying with account: 0xYourAddress +Account balance: 0.123456789 ETH +Step 1: Deploying DstackApp implementation... +DstackApp implementation deployed to: 0x... +Step 2: Deploying DstackKms... +DstackKms Proxy deployed to: 0x... +Complete KMS setup deployed successfully! +``` + +### Step 7: Save Contract Addresses + +Save the deployed addresses for use in later tutorials: + +```bash +# Replace with your actual addresses from the output above +KMS_ADDRESS="0xYourKmsProxyAddress" +APP_ADDRESS="0xYourAppImplAddress" + +# Save to secrets directory +echo "$KMS_ADDRESS" > ~/.dstack/secrets/kms-contract-address +echo "$APP_ADDRESS" > ~/.dstack/secrets/app-contract-address + +echo "Addresses saved to ~/.dstack/secrets/" +``` + +### Step 8: Verify Deployment + +Check the contract exists on-chain: + +```bash +KMS_ADDRESS=$(cat ~/.dstack/secrets/kms-contract-address) +cast code "$KMS_ADDRESS" --rpc-url https://ethereum-sepolia-rpc.publicnode.com | head -c 20 +``` + +If the contract is deployed, this returns bytecode (starting with `0x`). If it shows just `0x`, the contract was not found. + +View on Etherscan: +```bash +echo "https://sepolia.etherscan.io/address/$KMS_ADDRESS" +``` + +--- + +## Understanding the Contracts + +### UUPS Proxy Pattern + +The contracts use UUPS (Universal Upgradeable Proxy Standard): + +``` +Client Request + │ + ▼ +┌─────────────┐ +│ KMS Proxy │ ← Stores state, immutable address +│ (0x...) │ +└─────┬───────┘ + │ delegatecall + ▼ +┌─────────────┐ +│ KMS Logic │ ← Contains code, can be upgraded +│ (impl) │ +└─────────────┘ +``` + +This allows upgrading contract logic without changing addresses or losing state. + +### Contract Functions + +The DstackKms contract provides: + +| Function | Purpose | +|----------|---------| +| `isAppAllowed(appId)` | Check if an app is authorized | +| `registerApp(appId)` | Register a new application | +| `gatewayAppId()` | Get the gateway app identifier | + +--- + +## Troubleshooting + +For detailed solutions, see the [KMS Deployment Troubleshooting Guide](/tutorial/troubleshooting-kms-deployment#contract-deployment-issues): + +- [Artifact not found](/tutorial/troubleshooting-kms-deployment#artifact-not-found) +- [Insufficient funds](/tutorial/troubleshooting-kms-deployment#insufficient-funds) +- [Transaction underpriced](/tutorial/troubleshooting-kms-deployment#transaction-underpriced) +- [Nonce too low](/tutorial/troubleshooting-kms-deployment#nonce-too-low) +- [Connection failed](/tutorial/troubleshooting-kms-deployment#connection-failed) + +--- + +## Cost Estimation + +| Operation | Gas Used | Cost at 2 gwei | +|-----------|----------|----------------| +| DstackApp implementation | ~1,100,000 | ~0.0022 ETH | +| DstackKms proxy | ~210,000 | ~0.0004 ETH | +| **Total** | ~1,300,000 | ~0.0026 ETH | + +Sepolia testnet ETH is free from faucets. + +--- + +## Next Steps + +With contracts deployed, you're ready to build and configure the KMS: + +- [KMS Build & Configuration](/tutorial/kms-build-configuration) - Build and configure the dstack Key Management Service + +## Additional Resources + +- [Sepolia Etherscan](https://sepolia.etherscan.io/) +- [Hardhat Deployment Guide](https://hardhat.org/hardhat-runner/docs/guides/deploying) +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) diff --git a/docs/tutorials/dns-configuration.md b/docs/tutorials/dns-configuration.md new file mode 100644 index 00000000..33992534 --- /dev/null +++ b/docs/tutorials/dns-configuration.md @@ -0,0 +1,320 @@ +--- +title: "DNS Configuration" +description: "Configure Cloudflare DNS with wildcard domain support for dstack gateway deployment" +section: "Prerequisites" +stepNumber: 1 +totalSteps: 7 +lastUpdated: 2025-11-01 + +tags: + - dns + - cloudflare + - prerequisites +difficulty: beginner +estimatedTime: "30 minutes" +--- + +# DNS Configuration + +In this tutorial, you'll configure DNS for your dstack deployment using Cloudflare. The dstack gateway requires a wildcard domain to automatically provision subdomains for deployed applications with TLS certificates. + +## Why Cloudflare? + +The dstack gateway is designed to work with Cloudflare's DNS API for automatic TLS certificate provisioning. While you can use other DNS providers, Cloudflare integration provides: + +- **Automatic TLS**: Gateway provisions Let's Encrypt certificates via DNS-01 challenge +- **Free tier**: No cost for DNS and CDN services +- **Fast propagation**: DNS changes typically propagate within minutes +- **API access**: Programmatic DNS management for automation + +## Prerequisites + +Before starting, ensure you have: + +- A registered domain name (example: `yourdomain.com`) +- Access to your domain registrar's DNS settings +- A Cloudflare account (sign up at https://cloudflare.com if needed) + +## Step 1: Add Domain to Cloudflare + +### 1.1 Log into Cloudflare Dashboard + +Visit https://dash.cloudflare.com and log into your account. + +### 1.2 Add Your Domain + +1. Click **"+ Add"** in the top right navigation +2. Click **"Connect a domain"** in the submenu +3. Enter your domain name (e.g., `yourdomain.com`) and fill out the rest of the form according to your preferences +4. Click **"Continue"** +5. Select the **Free** plan (unless you need paid features) + +### 1.3 Update Nameservers at Your Registrar + +Cloudflare will display two nameservers (e.g., `aden.ns.cloudflare.com` and `olga.ns.cloudflare.com`) and instructions for updating your domain. For ease, these steps are: + +1. Log into your DNS provider (most likely your registrar) +2. Make sure DNSSEC is off +3. Replace your current nameservers with Cloudflare nameservers +4. Use the **"Check nameservers now"** button to confirm completion + +**Note:** Nameserver changes can take 24-48 hours to fully propagate, but often complete within a few hours. + +## Step 2: Configure DNS Records + +Once your domain is active on Cloudflare, configure the DNS records for dstack. + +### 2.1 Add A Record for Host + +Create an A record pointing your subdomain to the dstack host server: + +1. In Cloudflare dashboard, click on your domain +2. Navigate to **DNS** → **Records** +3. Click **"Add record"** +4. Configure: + - **Type**: A + - **Name**: `dstack` (or your preferred subdomain) + - **IPv4 address**: Your server IP (e.g., `173.231.234.133`) + - **Proxy status**: DNS only (gray cloud) - **Important!** + - **TTL**: Auto +5. Click **"Save"** + +**Why DNS only?** Cloudflare's proxy (orange cloud) would route traffic through their CDN, breaking TDX attestation. Use **gray cloud (DNS only)** to direct traffic straight to your server. + +### 2.2 Add A Record for Docker Registry + +Create an A record for the local Docker registry: + +1. Click **"Add record"** +2. Configure: + - **Type**: A + - **Name**: `registry` + - **IPv4 address**: Same server IP as above + - **Proxy status**: DNS only (gray cloud) + - **TTL**: Auto +3. Click **"Save"** + +This creates `registry.yourdomain.com` which is used by the local Docker registry for SSL certificates. + +### 2.3 Add Wildcard DNS Record + +Create a wildcard A record for application subdomains: + +1. Click **"Add record"** again +2. Configure: + - **Type**: A + - **Name**: `*.dstack` (wildcard under your subdomain) + - **IPv4 address**: Same server IP as above + - **Proxy status**: DNS only (gray cloud) + - **TTL**: Auto +3. Click **"Save"** + +This allows the gateway to automatically provision subdomains like: +- `app1.dstack.yourdomain.com` +- `app2.dstack.yourdomain.com` +- `custom-name.dstack.yourdomain.com` + +### 2.4 Add CAA Records (Optional but Recommended) + +CAA records restrict which Certificate Authorities can issue certificates for your domain: + +1. Click **"Add record"** +2. Configure: + - **Type**: CAA + - **Name**: `@` (for root domain, or use `dstack` for subdomain only) + - **Flags**: `0` + - **Tag**: Select **"Only allow specific hostnames"** from dropdown + - **CA domain name**: `letsencrypt.org` + - **TTL**: Auto +3. Click **"Save"** + +Repeat for wildcard subdomain: +1. Click **"Add record"** +2. Configure: + - **Type**: CAA + - **Name**: `*.dstack` + - **Flags**: `0` + - **Tag**: Select **"Only allow specific hostnames"** from dropdown + - **CA domain name**: `letsencrypt.org` + - **TTL**: Auto +3. Click **"Save"** + +**Note:** The "Only allow specific hostnames" tag option corresponds to the `issue` tag in CAA record syntax. This ensures only Let's Encrypt can issue certificates for your domain, improving security. + +## Step 3: Generate Cloudflare API Token + +The dstack gateway needs API access to manage DNS records for TLS certificate provisioning. + +### 3.1 Create API Token + +1. In Cloudflare dashboard, click your profile icon (top right) +2. Select **"My Profile"** +3. Navigate to **API Tokens** tab +4. Click **"Create Token"** +5. Use the **"Edit zone DNS"** template +6. Configure: + - **Permissions**: + - Zone → DNS → Edit + - **Zone Resources**: + - Include → Specific zone → Select your domain + - **TTL**: Not set (token doesn't expire, or set expiration if preferred) +7. Click **"Continue to summary"** +8. Review permissions +9. Click **"Create Token"** + +### 3.2 Save API Token Securely + +**IMPORTANT:** Copy the API token immediately and save it securely. You'll need this for gateway configuration. + +The token will look like: `abcdef123456789_example_token_xyz` + +**Store this token securely** - you won't be able to see it again in Cloudflare dashboard. Consider using: +- Password manager +- Encrypted file +- Secret management system (if deploying in production) + +### 3.3 Test API Token + +Verify the token works with a simple API test: + +```bash +# Replace TOKEN with your actual API token +# Replace ZONE_ID with your Cloudflare zone ID (found in domain Overview) +curl -X GET "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records" \ + -H "Authorization: Bearer YOUR_TOKEN" \ + -H "Content-Type: application/json" +``` + +Expected response: JSON with `"success": true` and list of your DNS records. + +NOTE: You can find the zone id on the right site of your domains overview page, under the API section. You may need to scroll to find it. + +## Step 4: Test DNS Resolution + +Verify your DNS configuration is working correctly. + +### 4.1 Test Base Domain + +```bash +# Replace with your actual subdomain +dig dstack.yourdomain.com + +# Should return your server IP in the ANSWER section +# Example output: +# dstack.yourdomain.com. 300 IN A 173.231.234.133 +``` + +### 4.2 Test Registry Domain + +```bash +dig registry.yourdomain.com + +# Should return your server IP +``` + +### 4.3 Test Wildcard Domain + +```bash +# Test a random subdomain under wildcard +dig test.dstack.yourdomain.com +dig app.dstack.yourdomain.com +dig anything.dstack.yourdomain.com + +# All should return your server IP +``` + +### 4.4 Verify from Multiple Locations + +DNS propagation can vary by location. Test from different DNS resolvers: + +```bash +# Google DNS +dig @8.8.8.8 dstack.yourdomain.com + +# Cloudflare DNS +dig @1.1.1.1 dstack.yourdomain.com + +# Your local DNS (no @) +dig dstack.yourdomain.com +``` + +All should return your server IP. + +## Step 5: Personalize Tutorial Commands + +The tutorials throughout this site use `yourdomain.com` as a placeholder domain. Now that your DNS is configured, you can replace all placeholders at once to avoid copy-paste errors. + +### Set Your Domains + +```bash +# Set your actual domains +export BASE_DOMAIN="yourdomain.com" # Your registered domain +export REGISTRY_DOMAIN="registry.${BASE_DOMAIN}" # Docker registry subdomain +export GATEWAY_DOMAIN="dstack.${BASE_DOMAIN}" # Gateway base domain (from *.dstack record) +export KMS_DOMAIN="kms.${GATEWAY_DOMAIN}" # KMS domain +``` + +### Replace in Tutorials + +```bash +cd ~/dstack-info + +# Replace all placeholders (most specific patterns first) +find src/content/tutorials -name "*.md" -exec sed -i \ + -e "s|registry\.yourdomain\.com|${REGISTRY_DOMAIN}|g" \ + -e "s|vmm\.dstack\.yourdomain\.com|vmm.${GATEWAY_DOMAIN}|g" \ + -e "s|kms\.yourdomain\.com|${KMS_DOMAIN}|g" \ + -e "s|dstack\.yourdomain\.com|${GATEWAY_DOMAIN}|g" \ + -e "s|yourdomain\.com|${BASE_DOMAIN}|g" \ + {} + +``` + +### Verify Replacements + +```bash +# Should return no results (or only this tutorial explaining the placeholder) +grep -r "yourdomain" src/content/tutorials/ | grep -v "dns-configuration.md" +``` + +> **Note:** These changes are local to your copy of the tutorials. Don't commit them to git — they're specific to your deployment. If you pull updates later, re-run the sed commands. + +## Step 6: DNS Record Summary + +After completion, you should have these DNS records in Cloudflare: + +| Type | Name | Value | Proxy Status | +|------|------|-------|--------------| +| A | `dstack` | Your server IP | DNS only (gray) | +| A | `registry` | Your server IP | DNS only (gray) | +| A | `*.dstack` | Your server IP | DNS only (gray) | +| CAA | `dstack` | `letsencrypt.org` | N/A | +| CAA | `*.dstack` | `letsencrypt.org` | N/A | + +## Troubleshooting + +For detailed solutions, see the [Prerequisites Troubleshooting Guide](/tutorial/troubleshooting-prerequisites#dns-configuration-issues): + +- [DNS Not Resolving](/tutorial/troubleshooting-prerequisites#dns-not-resolving) +- [Wildcard Not Working](/tutorial/troubleshooting-prerequisites#wildcard-not-working) +- [API Token Permission Denied](/tutorial/troubleshooting-prerequisites#api-token-permission-denied) +- [Propagation Taking Too Long](/tutorial/troubleshooting-prerequisites#propagation-taking-too-long) + +## Next Steps + +With DNS configured, you're ready to proceed to blockchain setup: + +- **Next Tutorial:** [Blockchain Wallet Setup](/tutorial/blockchain-setup) + +After completing all prerequisites (DNS + Blockchain), you'll configure the dstack gateway to use: +- Your domain for TLS certificate provisioning +- Your Cloudflare API token for DNS management +- Your blockchain wallet for KMS interactions + +--- + +**Important Notes:** + +- Keep your Cloudflare API token secure - treat it like a password +- Use DNS only (gray cloud) for dstack records to preserve TDX attestation +- Wildcard DNS enables automatic subdomain provisioning for applications +- CAA records improve security by restricting certificate issuance diff --git a/docs/tutorials/docker-setup.md b/docs/tutorials/docker-setup.md new file mode 100644 index 00000000..e7b7056c --- /dev/null +++ b/docs/tutorials/docker-setup.md @@ -0,0 +1,156 @@ +--- +title: "Docker Setup" +description: "Install Docker Engine for dstack services" +section: "Prerequisites" +stepNumber: 3 +totalSteps: 7 +lastUpdated: 2026-01-09 +prerequisites: + - ssl-certificate-setup +tags: + - docker + - containers + - prerequisites +difficulty: beginner +estimatedTime: "10 minutes" +--- + +# Docker Setup + +This tutorial guides you through installing Docker Engine on your TDX server. Docker is required for the Gramine Key Provider and Local Docker Registry. + +## What You'll Install + +| Component | Purpose | +|-----------|---------| +| **docker-ce** | Docker Engine (Community Edition) | +| **docker-ce-cli** | Docker command-line interface | +| **containerd.io** | Container runtime | +| **docker-buildx-plugin** | Extended build capabilities | +| **docker-compose-plugin** | Multi-container orchestration | + +## Prerequisites + +Before starting, ensure you have: + +- Completed [SSL Certificate Setup](/tutorial/ssl-certificate-setup) +- SSH access to your TDX server +- sudo privileges + + +## Manual Installation + +### Step 1: Check if Docker is Already Installed + +```bash +docker --version +``` + +If Docker is already installed, you can skip to [Verification](#verification). + +### Step 2: Install Prerequisites + +```bash +sudo apt update +sudo apt install -y ca-certificates curl gnupg +``` + +### Step 3: Add Docker GPG Key + +```bash +sudo install -m 0755 -d /etc/apt/keyrings +curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo tee /etc/apt/keyrings/docker.asc > /dev/null +sudo chmod a+r /etc/apt/keyrings/docker.asc +``` + +### Step 4: Add Docker Repository + +```bash +echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null +``` + +### Step 5: Install Docker Packages + +```bash +sudo apt update +sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin +``` + +### Step 6: Start Docker Service + +```bash +sudo systemctl start docker +sudo systemctl enable docker +``` + +### Step 7: Add User to Docker Group + +This allows running Docker commands without sudo: + +```bash +sudo usermod -aG docker $USER +``` + +**Important:** Log out and back in for the group membership to take effect, or run: + +```bash +newgrp docker +``` + +--- + +## Verification + +### Check Docker is Running + +```bash +docker info +``` + +You should see detailed information about the Docker installation. + +### Check Docker Version + +```bash +docker --version +``` + +Expected output: +``` +Docker version 27.x.x, build xxxxxxx +``` + +### Test Docker + +```bash +docker run hello-world +``` + +This downloads and runs a test image. You should see: +``` +Hello from Docker! +This message shows that your installation appears to be working correctly. +``` + +--- + +## Troubleshooting + +For detailed solutions, see the [Prerequisites Troubleshooting Guide](/tutorial/troubleshooting-prerequisites#docker-setup-issues): + +- [Permission Denied](/tutorial/troubleshooting-prerequisites#permission-denied) +- [Docker Service Not Starting](/tutorial/troubleshooting-prerequisites#docker-service-not-starting) +- [Repository Not Found](/tutorial/troubleshooting-prerequisites#repository-not-found) + +--- + +## Next Steps + +With Docker installed, proceed to: + +- [Gramine Key Provider](/tutorial/gramine-key-provider) - Deploy SGX-based key provider + +## Additional Resources + +- [Docker Documentation](https://docs.docker.com/) +- [Docker Engine Installation](https://docs.docker.com/engine/install/ubuntu/) diff --git a/docs/tutorials/gateway-build-configuration.md b/docs/tutorials/gateway-build-configuration.md new file mode 100644 index 00000000..0d24e815 --- /dev/null +++ b/docs/tutorials/gateway-build-configuration.md @@ -0,0 +1,648 @@ +--- +title: "Gateway CVM Preparation" +description: "Prepare gateway for CVM deployment: docker-compose, environment configuration, and app registration" +section: "Gateway Deployment" +stepNumber: 1 +totalSteps: 2 +lastUpdated: 2026-02-21 +prerequisites: + - kms-cvm-deployment +tags: + - dstack + - gateway + - cvm + - docker-compose + - configuration +difficulty: "advanced" +estimatedTime: "25 minutes" +--- + +# Gateway CVM Preparation + +This tutorial guides you through preparing the dstack gateway for deployment as a Confidential Virtual Machine (CVM). The gateway acts as a reverse proxy that forwards TLS connections to application CVMs via WireGuard tunnels, with TDX attestation providing cryptographic proof of integrity. + +Unlike a traditional host-based deployment, the gateway runs inside a CVM where its configuration is auto-generated, WireGuard keys are managed automatically, and TLS certificates are provisioned via the admin API. + +## Why Deploy Gateway in a CVM? + +| Benefit | Description | +|---------|-------------| +| **TDX Attestation** | Cryptographic proof that the gateway is running genuine, untampered code | +| **Memory Encryption** | WireGuard keys and TLS certificates protected by TDX hardware encryption | +| **WireGuard Isolation** | WireGuard runs inside the CVM, not exposed on the host | +| **Auto-Configuration** | `gateway.toml` and WireGuard keys generated automatically by the container entrypoint | + +## Prerequisites + +Before starting, ensure you have: + +- Completed [KMS CVM Deployment](/tutorial/kms-cvm-deployment) — KMS must be running and reachable +- dstack VMM running (`systemctl status dstack-vmm`) +- Cloudflare API token (from [DNS Configuration](/tutorial/dns-configuration/#step-3-generate-cloudflare-api-token)) +- **On your local machine:** Foundry toolchain installed (`cast` command — [install guide](https://book.getfoundry.sh/getting-started/installation)), wallet private key at `~/.dstack/secrets/sepolia-private-key`, KMS contract address at `~/.dstack/secrets/kms-contract-address` +- Python cryptography libraries for `vmm-cli.py`: + ```bash + sudo apt install -y python3-pip + pip3 install --break-system-packages cryptography eth-keys eth-utils "eth-hash[pycryptodome]" + ``` + + +## What Gets Prepared + +| Artifact | Purpose | +|----------|---------| +| **Gateway Docker image** | Locally-built image pushed to local registry (v0.5.7 not on Docker Hub) | +| **docker-compose.yaml** | Container definition with gateway image and environment variables | +| **.env** | Host-side environment variables for deployment | +| **.app_env** | CVM-side environment variables passed into the container | +| **app-compose.json** | VMM deployment manifest generated by `vmm-cli.py compose` | +| **On-chain registration** | Gateway app registered on the KMS smart contract | + +--- + +## Manual Preparation + +### Step 1: Verify Prerequisites + +Confirm KMS is running and reachable: + +```bash +curl -sk https://localhost:9100/prpc/KMS.GetMeta | jq '{chain_id, kms_contract_address}' +``` + +Expected output shows your KMS contract details: + +```json +{ + "chain_id": 11155111, + "kms_contract_address": "0xYOUR_KMS_CONTRACT_ADDRESS" +} +``` + +Confirm VMM is running: + +```bash +systemctl status dstack-vmm --no-pager +``` + +Verify VMM port mapping allows UDP (needed for WireGuard). Check your `/etc/dstack/vmm.toml`: + +```bash +grep -A5 'port_mapping' /etc/dstack/vmm.toml +``` + +The `range` must include a UDP entry. If it only has TCP, add UDP: + +```bash +sudo sed -i '/{ protocol = "tcp", from = 1, to = 20000 },/a\ { protocol = "udp", from = 1, to = 20000 },' /etc/dstack/vmm.toml +sudo systemctl restart dstack-vmm +``` + +> **Why UDP?** The gateway uses WireGuard (UDP port 51820 inside the CVM, mapped to host port 9202) for secure tunnels to application CVMs. Without UDP port mapping enabled, the VMM will reject the deployment. + +### Step 2: Create Deployment Directory + +```bash +mkdir -p ~/gateway-deploy +``` + +### Step 3: Build Gateway Docker Image + +The `dstacktee/dstack-gateway:0.5.7` image isn't published on Docker Hub, so we build it locally from the dstack source you cloned in [Build dstack from Source](/tutorial/clone-build-dstack-vmm). This follows the same pattern as the [KMS image build](/tutorial/kms-build-configuration/#step-7-create-docker-image-for-cvm-deployment). + +#### Build the gateway binary + +The [Build dstack from Source](/tutorial/clone-build-dstack-vmm) tutorial builds `dstack-vmm` and `supervisor`, but not the gateway. Build it now: + +```bash +cd ~/dstack +cargo build --release -p dstack-gateway +``` + +Verify the binary was built: + +```bash +ls -lh ~/dstack/target/release/dstack-gateway +``` + +Expected output (typically 15-25MB): +``` +-rwxrwxr-x 1 ubuntu ubuntu 20M ... /home/ubuntu/dstack/target/release/dstack-gateway +``` + +#### Create Dockerfile + +```bash +cat > ~/gateway-deploy/Dockerfile << 'EOF' +FROM ubuntu:24.04 + +RUN apt-get update && \ + apt-get install -y --no-install-recommends \ + wireguard-tools \ + iproute2 \ + jq \ + ca-certificates \ + && rm -rf /var/lib/apt/lists/* + +COPY dstack-gateway /usr/local/bin/dstack-gateway +RUN chmod 755 /usr/local/bin/dstack-gateway + +WORKDIR /app +COPY entrypoint.sh /app/entrypoint.sh +RUN chmod +x /app/entrypoint.sh + +ENTRYPOINT ["/app/entrypoint.sh"] +CMD ["dstack-gateway", "-c", "/data/gateway/gateway.toml"] +EOF +``` + +#### Copy build artifacts + +Copy the gateway binary and entrypoint script into the build context: + +```bash +cp ~/dstack/target/release/dstack-gateway ~/gateway-deploy/ +cp ~/dstack/gateway/dstack-app/builder/entrypoint.sh ~/gateway-deploy/ +``` + +#### Build Docker image + +```bash +cd ~/gateway-deploy +docker build -t dstack-gateway:latest . +``` + +#### Verify image + +```bash +docker images dstack-gateway +``` + +Expected output: +``` +REPOSITORY TAG IMAGE ID CREATED SIZE +dstack-gateway latest abc123def456 10 seconds ago ~150MB +``` + +#### Push to local registry + +Tag and push to your local Docker registry so CVMs can pull it during boot: + +```bash +docker tag dstack-gateway:latest localhost:5000/dstack-gateway:latest +docker tag dstack-gateway:latest localhost:5000/dstack-gateway:fixed + +docker push localhost:5000/dstack-gateway:latest +docker push localhost:5000/dstack-gateway:fixed +``` + +Verify the image is in the registry (via HAProxy): + +```bash +curl -sk https://registry.yourdomain.com/v2/dstack-gateway/tags/list +``` + +Expected output: +```json +{"name":"dstack-gateway","tags":["fixed","latest"]} +``` + +### Step 4: Register Gateway App On-Chain + +> **Important: Run this step on your LOCAL machine.** On-chain transactions require your wallet private key, which stays on your local machine (never on the server). You need [Foundry](https://book.getfoundry.sh/getting-started/installation) installed locally (`curl -L https://foundry.paradigm.xyz | bash && foundryup`). + +The gateway needs an on-chain app identity so KMS can issue it TLS certificates via attestation. + +#### Load wallet credentials + +On your **local machine**: + +```bash +export PRIVATE_KEY=$(cat ~/.dstack/secrets/sepolia-private-key) +export ETH_RPC_URL="https://ethereum-sepolia-rpc.publicnode.com" +export KMS_CONTRACT_ADDR=$(cat ~/.dstack/secrets/kms-contract-address) +``` + +#### Deploy and register the gateway app + +This creates a new `DstackApp` contract and registers it with the KMS contract in a single transaction: + +```bash +MY_ADDR=$(cast wallet address --private-key $PRIVATE_KEY) +ZERO=0x0000000000000000000000000000000000000000000000000000000000000000 + +GATEWAY_APP_ID=$(cast send "$KMS_CONTRACT_ADDR" "deployAndRegisterApp(address,bool,bool,bytes32,bytes32)" "$MY_ADDR" false true "$ZERO" "$ZERO" --rpc-url "$ETH_RPC_URL" --private-key "$PRIVATE_KEY" --json | jq -r '.logs[-1].topics[1]' | sed 's/0x000000000000000000000000/0x/') + +echo "Gateway App ID: $GATEWAY_APP_ID" +``` + +> **What this does:** `deployAndRegisterApp` deploys a new `DstackApp` proxy contract and registers it with the KMS. The parameters are: `initialOwner` (your wallet), `disableUpgrades` (false), `allowAnyDevice` (true — allows any TDX device to run this app), `initialDeviceId` (zero — not device-locked), and `initialComposeHash` (zero — we'll add it after generating the compose). +> +> **Log parsing:** The transaction emits multiple events. The app proxy address is in the **last** log entry (`AppDeployedViaFactory`), not the first (which is the `Upgraded` event from the implementation contract). + +#### Verify the app was created correctly + +Confirm the contract owner matches your wallet: + +```bash +cast call "$GATEWAY_APP_ID" "owner()(address)" --rpc-url "$ETH_RPC_URL" +``` + +This should return your wallet address. If it returns `0x000...000`, the log parsing failed — check the troubleshooting section. + +#### Set the gateway app ID on the KMS contract + +Tell the KMS contract which app is the gateway: + +```bash +cast send "$KMS_CONTRACT_ADDR" "setGatewayAppId(string)" "$GATEWAY_APP_ID" --rpc-url "$ETH_RPC_URL" --private-key "$PRIVATE_KEY" +``` + +#### Verify registration + +```bash +cast call "$KMS_CONTRACT_ADDR" "gatewayAppId()(string)" --rpc-url "$ETH_RPC_URL" +``` + +Should return your gateway app ID. + +#### Whitelist the OS image + +The KMS contract maintains an allowlist of OS image hashes. Each dstack guest image includes a `digest.txt` file containing its SHA256 hash. The VMM passes this hash to KMS during attestation, and KMS rejects any hash that isn't whitelisted. + +Get the image digest from your server and whitelist it: + +```bash +OS_IMAGE_HASH=$(ssh ubuntu@YOUR_SERVER_IP 'cat /var/lib/dstack/images/dstack-0.5.7/digest.txt') +echo "OS image hash: 0x$OS_IMAGE_HASH" +``` + +```bash +cast send "$KMS_CONTRACT_ADDR" "addOsImageHash(bytes32)" "0x$OS_IMAGE_HASH" --rpc-url "$ETH_RPC_URL" --private-key "$PRIVATE_KEY" +``` + +Verify it was added: + +```bash +cast call "$KMS_CONTRACT_ADDR" "allowedOsImages(bytes32)(bool)" "0x$OS_IMAGE_HASH" --rpc-url "$ETH_RPC_URL" +``` + +Expected output: `true` + +> **Note:** This step is required for any CVM that needs KMS attestation. Each dstack release has a different digest — if you upgrade images, you must whitelist the new hash. If you already whitelisted this hash during KMS setup, you can skip this step. + +#### Save the app ID and copy to server + +Save the gateway app ID locally and copy it to the server (needed for later steps): + +```bash +# Save locally +mkdir -p ~/.dstack/secrets +echo "$GATEWAY_APP_ID" > ~/.dstack/secrets/gateway-app-id + +# Copy to server +scp ~/.dstack/secrets/gateway-app-id ubuntu@YOUR_SERVER_IP:~/.dstack/secrets/ +``` + +### Step 5: Create docker-compose.yaml + +> **Back to the server.** Steps 5-7 and 9-10 run on your **TDX server**. SSH back in if needed: +> ```bash +> ssh ubuntu@YOUR_SERVER_IP +> ``` + +Create the compose file that runs the gateway inside the CVM. The container entrypoint auto-generates `gateway.toml` and WireGuard keys from these environment variables. + +```bash +cat > ~/gateway-deploy/docker-compose.yaml << 'EOF' +services: + gateway: + image: registry.yourdomain.com/dstack-gateway:fixed + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + - /dstack:/dstack + - data:/data + network_mode: host + privileged: true + environment: + - SUBNET_INDEX=${SUBNET_INDEX} + - WG_ENDPOINT=${WG_ENDPOINT} + - MY_URL=${MY_URL} + - BOOTNODE_URL=${BOOTNODE_URL} + - WG_IP=${WG_IP} + - WG_RESERVED_NET=${WG_RESERVED_NET} + - WG_CLIENT_RANGE=${WG_CLIENT_RANGE} + - NODE_ID=${NODE_ID} + - RUST_LOG=info,certbot=debug + - RPC_DOMAIN=${RPC_DOMAIN} + - PROXY_LISTEN_PORT=${PROXY_LISTEN_PORT:-443} + - PROXY_WORKERS=${PROXY_WORKERS:-32} + - MAX_CONNECTIONS_PER_APP=${MAX_CONNECTIONS_PER_APP:-0} + - ADMIN_LISTEN_ADDR=${ADMIN_LISTEN_ADDR:-0.0.0.0} + - ADMIN_LISTEN_PORT=${ADMIN_LISTEN_PORT:-8001} + restart: always + +volumes: + data: +EOF +``` + +> **Note:** The image uses your registry domain (not `localhost:5000`) because CVMs use QEMU user-mode networking — `localhost` inside a CVM refers to the CVM itself, not the host. Docker inside the CVM resolves `registry.yourdomain.com` via DNS to the host's public IP, where HAProxy proxies to the local registry on port 5000. The `:fixed` tag is a stable alias that won't change unexpectedly. This matches the pattern used by the [KMS CVM deployment](/tutorial/kms-cvm-deployment/#step-3-create-docker-composeyaml). + +**What the container does automatically:** +- Generates WireGuard key pair (or reuses existing from `/data/gateway/wg.key`) +- Creates `gateway.toml` at `/data/gateway/gateway.toml` from environment variables +- Starts the gateway RPC server on port 8000 +- Starts the admin API on port 8001 +- Starts WireGuard on port 51820 +- Starts the HTTPS proxy on port 443 + +### Step 6: Create .env File + +This file stores your deployment-specific values. The variables are used both during preparation and deployment. + +```bash +cat > ~/gateway-deploy/.env << EOF +# Required: VMM RPC endpoint +VMM_RPC=http://127.0.0.1:9080 + +# Required: Cloudflare API token for DNS-01 challenges +CF_API_TOKEN=YOUR_CLOUDFLARE_API_TOKEN + +# Required: Service domain (wildcard base for app subdomains) +SRV_DOMAIN=dstack.yourdomain.com + +# Required: Host public IP address +PUBLIC_IP=$(curl -s4 ifconfig.me) + +# Required: Gateway app ID from on-chain registration +GATEWAY_APP_ID=$(cat ~/.dstack/secrets/gateway-app-id) + +# Required: KMS endpoint (host-side, for vmm-cli.py encryption) +KMS_URL=https://127.0.0.1:9100 + +# Required: KMS domain name (must match KMS_DOMAIN from KMS docker-compose) +# Used as the CVM-side KMS URL so the TLS certificate hostname matches +KMS_DOMAIN=kms.dstack.yourdomain.com + +# Node ID (must be unique if running multiple gateways) +NODE_ID=1 + +# Subnet index (0-15, determines WireGuard IP range) +SUBNET_INDEX=0 + +# Guest OS image +OS_IMAGE=dstack-0.5.7 +EOF +``` + +Replace the placeholder values with your actual Cloudflare API token, domain, and KMS domain: + +```bash +sed -i 's/YOUR_CLOUDFLARE_API_TOKEN/your-actual-cloudflare-token/' ~/gateway-deploy/.env +sed -i 's/dstack.yourdomain.com/your-actual-domain.com/' ~/gateway-deploy/.env +sed -i 's/kms.dstack.yourdomain.com/kms.your-actual-domain.com/' ~/gateway-deploy/.env +``` + +**Environment variable reference:** + +| Variable | Required | Description | +|----------|----------|-------------| +| `VMM_RPC` | Yes | VMM RPC endpoint | +| `CF_API_TOKEN` | Yes | Cloudflare API token for DNS-01 certificate challenges | +| `SRV_DOMAIN` | Yes | Base domain for app subdomains (e.g., `dstack.yourdomain.com`) | +| `PUBLIC_IP` | Yes | Host's public IP address (for WireGuard endpoint) | +| `GATEWAY_APP_ID` | Yes | App ID from on-chain registration (Step 4) | +| `KMS_URL` | Yes | KMS RPC endpoint URL (host-side, for vmm-cli.py encryption) | +| `KMS_DOMAIN` | Yes | KMS domain name matching its TLS certificate (from KMS docker-compose `KMS_DOMAIN`) | +| `NODE_ID` | Yes | Unique node ID (default: 1) | +| `SUBNET_INDEX` | No | WireGuard subnet index 0-15 (default: 0) | +| `OS_IMAGE` | No | Guest OS image name (default: dstack-0.5.7) | + +### Step 7: Generate .app_env and app-compose.json + +Load the environment and calculate derived values: + +```bash +cd ~/gateway-deploy +set -a; source .env; set +a + +# Calculate WireGuard IP allocation from SUBNET_INDEX +WG_IP_PREFIX="10.$((SUBNET_INDEX + 240)).0" +WG_IP="${WG_IP_PREFIX}.1/12" +WG_RESERVED_NET="${WG_IP_PREFIX}.1/32" +WG_CLIENT_RANGE="${WG_IP_PREFIX}.0/16" +WG_PORT=9202 +RPC_DOMAIN="gateway.$SRV_DOMAIN" +MY_URL="https://${RPC_DOMAIN}" + +# Create .app_env (environment variables passed into the CVM) +cat > .app_env << ENVEOF +SUBNET_INDEX=$SUBNET_INDEX +WG_ENDPOINT=$PUBLIC_IP:$WG_PORT +MY_URL=$MY_URL +WG_IP=$WG_IP +WG_RESERVED_NET=$WG_RESERVED_NET +WG_CLIENT_RANGE=$WG_CLIENT_RANGE +RPC_DOMAIN=$RPC_DOMAIN +NODE_ID=$NODE_ID +PROXY_LISTEN_PORT=443 +ENVEOF + +echo "Generated .app_env:" +cat .app_env +``` + +> **Why MY_URL should NOT include port 9202:** `MY_URL` is used by the gateway to generate application board links (e.g., `https://gateway.dstack.yourdomain.com/dashboard`). If `MY_URL` includes `:9202`, those links point to port 9202, which serves the dstack-internal TLS certificate (not the Let's Encrypt cert from HAProxy). Browsers will show certificate errors. By using the bare domain (port 443), links go through HAProxy where the `gateway_rpc_passthrough` rule forwards to port 9202 behind a proper TLS chain. + +Now generate the VMM deployment manifest: + +```bash +cd ~/dstack/vmm + +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +./src/vmm-cli.py --url http://127.0.0.1:9080 compose \ + --docker-compose ~/gateway-deploy/docker-compose.yaml \ + --name dstack-gateway \ + --kms \ + --env-file ~/gateway-deploy/.app_env \ + --public-logs \ + --public-sysinfo \ + --no-instance-id \ + --output ~/gateway-deploy/app-compose.json +``` + +**Key flags explained:** + +| Flag | Purpose | +|------|---------| +| `--kms` | Enable KMS integration for TDX attestation and certificate provisioning | +| `--env-file` | Pass runtime environment variables into the CVM | +| `--public-logs` | Allow log access via VMM API | +| `--public-sysinfo` | Allow system info queries via VMM API | +| `--no-instance-id` | Don't append instance ID to app name | + +> **Note:** Do NOT use the `--secure-time` flag — it causes the CVM to hang during boot waiting for time synchronization. + +### Step 8: Whitelist Compose Hash On-Chain + +> **Important: Run this step on your LOCAL machine.** This requires your wallet private key. + +The KMS contract verifies that the exact compose configuration is authorized before issuing certificates. Get the hash from the server and register it on-chain: + +On your **local machine**, get the compose hash from the server: + +```bash +COMPOSE_HASH=$(ssh ubuntu@YOUR_SERVER_IP 'sha256sum ~/gateway-deploy/app-compose.json' | cut -d' ' -f1) +echo "Compose hash: 0x$COMPOSE_HASH" +``` + +Load wallet credentials: + +```bash +export PRIVATE_KEY=$(cat ~/.dstack/secrets/sepolia-private-key) +export ETH_RPC_URL="https://ethereum-sepolia-rpc.publicnode.com" +export GATEWAY_APP_ID=$(cat ~/.dstack/secrets/gateway-app-id) +``` + +Add the compose hash to the gateway app's allowed list: + +```bash +cast send "$GATEWAY_APP_ID" "addComposeHash(bytes32)" "0x$COMPOSE_HASH" --rpc-url "$ETH_RPC_URL" --private-key "$PRIVATE_KEY" +``` + +Verify the hash was added: + +```bash +cast call "$GATEWAY_APP_ID" "allowedComposeHashes(bytes32)(bool)" "0x$COMPOSE_HASH" --rpc-url "$ETH_RPC_URL" +``` + +Expected output: + +``` +true +``` + +> **Important:** If you modify `docker-compose.yaml` or `.app_env` and regenerate `app-compose.json`, the hash will change. You must whitelist the new hash before deploying. + +### Step 9: Verify Preparation + +> **Back to the server.** Steps 9-10 run on your **TDX server**. + +Confirm all artifacts are in place: + +```bash +echo "=== Preparation Checklist ===" +echo "" + +# Check deployment files +for f in docker-compose.yaml .env .app_env app-compose.json; do + if [ -f ~/gateway-deploy/$f ]; then + echo "[OK] ~/gateway-deploy/$f" + else + echo "[MISSING] ~/gateway-deploy/$f" + fi +done + +# Check secrets +for f in gateway-app-id vmm-auth-token; do + if [ -f ~/.dstack/secrets/$f ]; then + echo "[OK] ~/.dstack/secrets/$f" + else + echo "[MISSING] ~/.dstack/secrets/$f" + fi +done + +echo "" +echo "Gateway App ID: $(cat ~/.dstack/secrets/gateway-app-id)" +echo "Compose Hash: $(sha256sum ~/gateway-deploy/app-compose.json | cut -d' ' -f1)" +``` + +All items should show `[OK]`. + +### Step 10: Verify KMS Can Reach On-Chain State + +The KMS auth-eth service queries the blockchain directly (via `eth_call`) for each attestation request — it does not cache state. Verify KMS can see the gateway app registration: + +```bash +curl -sk https://localhost:9100/prpc/KMS.GetMeta | jq '.gateway_app_id' +``` + +This should return your gateway app ID (the one saved in `~/.dstack/secrets/gateway-app-id`). + +If the gateway app ID is wrong or missing, see [KMS shows wrong gateway app ID](/tutorial/troubleshooting-gateway-deployment#kms-shows-wrong-gateway-app-id) in the troubleshooting guide. + +--- + +## Architecture + +### CVM-based Gateway + +``` +┌─────────────────────────────────────────────────────────┐ +│ TDX Host │ +│ │ +│ ┌───────────────┐ ┌───────────────┐ │ +│ │ HAProxy │ │ dstack-vmm │ │ +│ │ :80, :443 │ │ :9080 │ │ +│ └───────┬───────┘ └───────────────┘ │ +│ │ │ +│ ┌───────▼─────────────────────────────────────────┐ │ +│ │ Gateway CVM (TDX Protected) │ │ +│ │ │ │ +│ │ ┌────────────────────────────────────────────┐ │ │ +│ │ │ Docker Container (privileged) │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────────┐ ┌──────────────────┐ │ │ │ +│ │ │ │ Gateway │ │ WireGuard │ │ │ │ +│ │ │ │ RPC :8000 │ │ wg-ds-gw :51820│ │ │ │ +│ │ │ │ Admin:8001 │ │ │ │ │ │ +│ │ │ │ Proxy:443 │ │ 10.240.0.0/16 │ │ │ │ +│ │ │ └──────────────┘ └────────┬─────────┘ │ │ │ +│ │ └─────────────────────────────┼──────────────┘ │ │ +│ │ │ │ │ +│ │ guest-agent (/var/run/dstack.sock) │ │ +│ └────────────────────────────────┼─────────────────┘ │ +│ │ │ +│ ┌────────────────────────────────▼─────────────────┐ │ +│ │ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ App CVM 1 │ │ App CVM 2 │ ... │ │ +│ │ │ 10.240.0.x │ │ 10.240.0.y │ │ │ +│ │ └─────────────┘ └─────────────┘ │ │ +│ └──────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────┐ │ +│ │ KMS CVM (TDX Protected) │ │ +│ │ :9100 │ │ +│ └─────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### Key Differences from Host-based Gateway + +| Aspect | Host-based Gateway | CVM-based Gateway | +|--------|-------------------|-------------------| +| WireGuard | Host interface (dgw) | Inside CVM (wg-ds-gw), auto-managed | +| Configuration | Manual gateway.toml | Auto-generated by entrypoint.sh | +| TLS Certificates | Manual certbot on host | Admin API + Let's Encrypt inside CVM | +| Service Manager | systemd | VMM-managed CVM | +| TDX Attestation | Not available | Full attestation with quotes | +| Memory Protection | OS-level only | TDX hardware encryption | + +--- + +## Troubleshooting + +For detailed solutions, see the [Gateway Deployment Troubleshooting Guide](/tutorial/troubleshooting-gateway-deployment#gateway-build--configuration-issues): + +- [Contract transaction reverts](/tutorial/troubleshooting-gateway-deployment#contract-transaction-reverts) +- [Compose hash mismatch](/tutorial/troubleshooting-gateway-deployment#compose-hash-mismatch) +- [vmm-cli.py compose errors](/tutorial/troubleshooting-gateway-deployment#vmm-clipy-compose-errors) +- [KMS shows wrong gateway app ID](/tutorial/troubleshooting-gateway-deployment#kms-shows-wrong-gateway-app-id) + +--- + +## Next Steps + +With all artifacts prepared and the compose hash whitelisted, proceed to [Gateway CVM Deployment](/tutorial/gateway-service-setup) to deploy the CVM and bootstrap the admin API. diff --git a/docs/tutorials/gateway-service-setup.md b/docs/tutorials/gateway-service-setup.md new file mode 100644 index 00000000..2418d09f --- /dev/null +++ b/docs/tutorials/gateway-service-setup.md @@ -0,0 +1,517 @@ +--- +title: "Gateway CVM Deployment" +description: "Deploy the dstack gateway as a CVM, bootstrap the admin API, and verify operation" +section: "Gateway Deployment" +stepNumber: 2 +totalSteps: 2 +lastUpdated: 2026-02-21 +prerequisites: + - gateway-build-configuration +tags: + - dstack + - gateway + - cvm + - deployment + - admin-api + - wireguard +difficulty: "advanced" +estimatedTime: "25 minutes" +--- + +# Gateway CVM Deployment + +This tutorial deploys the dstack gateway as a Confidential Virtual Machine and bootstraps its admin API. After deployment, the gateway will handle TLS termination, WireGuard tunnels to application CVMs, and automatic certificate provisioning via Let's Encrypt. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [Gateway CVM Preparation](/tutorial/gateway-build-configuration) +- All deployment artifacts in `~/gateway-deploy/`: + - `docker-compose.yaml` + - `.env` + - `.app_env` + - `app-compose.json` +- Compose hash whitelisted on-chain +- KMS CVM running on port 9100 +- Python cryptography libraries installed (`sudo apt install -y python3-pip && pip3 install --break-system-packages cryptography eth-keys eth-utils "eth-hash[pycryptodome]"`) + + +## What Gets Deployed + +When you deploy the gateway CVM, the following happens: + +1. **CVM Creation** — VMM creates a TDX-protected virtual machine with user-mode networking +2. **Container Start** — Docker container runs inside the CVM in privileged mode +3. **Config Generation** — Entrypoint script generates `gateway.toml` and WireGuard keys +4. **WireGuard Setup** — WireGuard interface created inside the CVM +5. **TLS Bootstrap** — Gateway contacts KMS for TDX-attested TLS certificates +6. **Service Ready** — RPC server (port 8000) and admin API (port 8001) start accepting connections + +### Port Mappings + +The CVM uses user-mode networking with explicit port forwarding from host to container: + +| Host Port | Container Port | Protocol | Purpose | +|-----------|---------------|----------|---------| +| 0.0.0.0:9202 | 8000 | TCP | Gateway RPC (public) | +| 127.0.0.1:9203 | 8001 | TCP | Admin API (localhost only) | +| 127.0.0.1:9206 | 8090 | TCP | Guest agent | +| 0.0.0.0:9202 | 51820 | UDP | WireGuard tunnel | +| 0.0.0.0:9204 | 443 | TCP | HTTPS proxy (app traffic) | + +> **Security note:** The admin API (port 9203) is bound to localhost only. It is not accessible from the internet. + +--- + +## Manual Deployment + +### Step 1: Verify Prerequisites + +```bash +# Check KMS is reachable +curl -sk https://localhost:9100/prpc/KMS.GetMeta | jq '{chain_id}' && echo "KMS: OK" + +# Check deployment artifacts exist +ls ~/gateway-deploy/app-compose.json && echo "Compose: OK" +``` + +### Step 2: Deploy the Gateway CVM + +Load environment variables and deploy: + +```bash +cd ~/gateway-deploy +set -a; source .env; set +a + +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +./src/vmm-cli.py --url http://127.0.0.1:9080 deploy \ + --name dstack-gateway \ + --app-id "$(cat ~/.dstack/secrets/gateway-app-id)" \ + --compose ~/gateway-deploy/app-compose.json \ + --env-file ~/gateway-deploy/.app_env \ + --kms-url "https://127.0.0.1:9100" \ + --kms-url "https://kms.dstack.yourdomain.com:9100" \ + --image dstack-0.5.7 \ + --vcpu 32 \ + --memory 32G \ + --port tcp:0.0.0.0:9202:8000 \ + --port tcp:127.0.0.1:9203:8001 \ + --port tcp:127.0.0.1:9206:8090 \ + --port udp:0.0.0.0:9202:51820 \ + --port tcp:0.0.0.0:9204:443 +``` + +**Key flags explained:** + +| Flag | Value | Purpose | +|------|-------|---------| +| `--app-id` | Gateway app ID | Links CVM to on-chain app identity | +| `--kms-url` (1st) | https://127.0.0.1:9100 | KMS endpoint for host-side env encryption | +| `--kms-url` (2nd) | https://kms.dstack.yourdomain.com:9100 | KMS endpoint accessible from inside the CVM (must match KMS TLS cert domain) | +| `--vcpu 32` | 32 vCPUs | Gateway needs resources for TLS + proxy workload | +| `--memory 32G` | 32 GB RAM | Memory for connection handling and WireGuard | +| `--port` | Various | User-mode networking port mappings (see table above) | + +> **Why two `--kms-url` values?** The first URL (`127.0.0.1:9100`) is used by `vmm-cli.py` on the host to encrypt environment variables before passing them to the CVM. The second URL uses the KMS domain name and is passed into the CVM so the gateway can reach KMS at runtime. The domain must match the KMS TLS certificate (set by `KMS_DOMAIN` in the KMS docker-compose). Inside a CVM with user-mode networking, `127.0.0.1` refers to the CVM itself, not the host — so the CVM resolves the KMS domain via DNS to reach the host's public IP. + +### Step 3: Monitor Deployment + +List VMs to get the gateway's ID: + +```bash +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +View boot logs (replace `VM_ID` with the actual ID from `lsvm`): + +```bash +# View recent logs +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=100" + +# Follow logs in real-time +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=true&ansi=false" +``` + +Look for these log messages indicating successful startup: + +``` +Configuration file generated: /data/gateway/gateway.toml +WG_IP: 10.240.0.1/12 +Gateway starting... +RPC server listening on 0.0.0.0:8000 +Admin API listening on 0.0.0.0:8001 +``` + +Wait for the admin API to become reachable (may take 1-2 minutes): + +```bash +until curl -sf http://127.0.0.1:9203/prpc/Status > /dev/null 2>&1; do + echo "Waiting for admin API..." + sleep 5 +done +echo "Admin API is ready" +``` + +### Step 4: Bootstrap Admin API + +The admin API must be configured with certbot settings, DNS credentials, and the service domain before the gateway can issue TLS certificates for applications. + +```bash +ADMIN_ADDR="127.0.0.1:9203" +``` + +#### 4a. Set certbot configuration + +> **Why does the gateway need its own certificates?** The host machine may already have a wildcard cert for `*.yourdomain.com`, but the gateway runs inside a CVM with user-mode networking — it has no access to the host filesystem. The gateway requests its own Let's Encrypt wildcard certificate from inside the CVM, and this cert is stored in the CVM's WaveKV persistent store (not on the host). This is by design: certificates generated inside the CVM are tied to the TDX attestation chain, providing zero-trust HTTPS. Container restarts within a running CVM preserve the cert data (Docker named volumes survive restarts), but destroying and recreating the CVM wipes the WaveKV store and triggers a fresh certificate request. + +Configure Let's Encrypt ACME settings. We start with the **staging** environment to avoid hitting production rate limits during initial setup and testing: + +```bash +curl -sf -X POST "http://$ADMIN_ADDR/prpc/SetCertbotConfig" \ + -H "Content-Type: application/json" \ + -d '{ + "acme_url": "https://acme-staging-v02.api.letsencrypt.org/directory", + "renew_interval_secs": 3600, + "renew_before_expiration_secs": 864000, + "renew_timeout_secs": 300 + }' && echo "Certbot config set (STAGING)" +``` + +| Setting | Value | Description | +|---------|-------|-------------| +| `acme_url` | Let's Encrypt **staging** | ACME directory URL (staging has 30,000 cert limit vs production's 10 per 3 hours) | +| `renew_interval_secs` | 3600 (1 hour) | How often to check for renewal | +| `renew_before_expiration_secs` | 864000 (10 days) | Renew this far before expiry | +| `renew_timeout_secs` | 300 (5 min) | Timeout for renewal attempts | + +> **Staging vs production:** Staging certificates are signed by a fake CA and will show browser warnings — this is expected. The staging environment exists specifically for testing and has much higher rate limits. After verifying the gateway works correctly in [Step 6](#step-6-switch-to-production-certificates), you'll switch to the production ACME URL to get browser-trusted certificates. + +#### 4b. Create DNS credential + +Add your Cloudflare API token for DNS-01 challenges: + +```bash +CF_API_TOKEN=$(grep CF_API_TOKEN ~/gateway-deploy/.env | cut -d= -f2) + +curl -sf -X POST "http://$ADMIN_ADDR/prpc/CreateDnsCredential" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "cloudflare", + "provider_type": "cloudflare", + "cf_api_token": "'"$CF_API_TOKEN"'", + "set_as_default": true + }' && echo "DNS credential created" +``` + +Verify the credential was stored: + +```bash +curl -sf "http://$ADMIN_ADDR/prpc/ListDnsCredentials" | jq '.credentials' +``` + +#### 4c. Add ZT-Domains + +Register the service domains for zero-trust application routing. You need **two** domains: + +1. `$SRV_DOMAIN` — covers `*.$SRV_DOMAIN` (e.g., `vmm.dstack.yourdomain.com`) +2. `gateway.$SRV_DOMAIN` — covers `*.gateway.$SRV_DOMAIN` (e.g., `-.gateway.dstack.yourdomain.com`) + +The second domain is required because app URLs are two levels deep (`-.gateway.$SRV_DOMAIN`), and a wildcard cert for `*.$SRV_DOMAIN` does not cover subdomains of subdomains. + +> **Important:** ZT domains must be added **after** the DNS credential is set as default (Step 4b). The gateway uses the default credential for DNS-01 challenges when requesting certificates for these domains. + +```bash +SRV_DOMAIN=$(grep SRV_DOMAIN ~/gateway-deploy/.env | cut -d= -f2) + +# Add the base service domain +curl -sf -X POST "http://$ADMIN_ADDR/prpc/AddZtDomain" \ + -H "Content-Type: application/json" \ + -d '{ + "domain": "'"$SRV_DOMAIN"'", + "port": 443, + "priority": 100 + }' && echo "ZT-Domain added: $SRV_DOMAIN" + +# Add the gateway subdomain (for app URLs like -.gateway.$SRV_DOMAIN) +curl -sf -X POST "http://$ADMIN_ADDR/prpc/AddZtDomain" \ + -H "Content-Type: application/json" \ + -d '{ + "domain": "gateway.'"$SRV_DOMAIN"'", + "port": 443, + "priority": 100 + }' && echo "ZT-Domain added: gateway.$SRV_DOMAIN" +``` + +Verify both domains were registered: + +```bash +curl -sf "http://$ADMIN_ADDR/prpc/ListZtDomains" | jq '.domains' +``` + +You should see both domains listed. + +### Step 5: Verify Gateway Operation + +#### Check admin API status + +```bash +curl -sf http://127.0.0.1:9203/prpc/Status | jq . +``` + +Expected output shows gateway status with node information (node ID, WireGuard key, connections). + +#### Check public RPC port + +Verify TLS is working on the public endpoint: + +```bash +curl -sk https://localhost:9202/ -o /dev/null -w '%{http_code}\n' +``` + +A `404` response confirms the HTTPS listener is active. This port uses a TDX-attested `Dstack App CA` certificate for internal CVM communication — this is correct and expected. + +#### Verify WireGuard is running inside the CVM + +The WireGuard interface runs inside the CVM, not on the host. You can verify it's listening by checking the UDP port: + +```bash +sudo ss -ulnp | grep 9202 +``` + +Expected output shows the UDP port mapped to the CVM: + +``` +UNCONN 0 0 0.0.0.0:9202 0.0.0.0:* +``` + +#### Test external RPC access + +From another machine (or using your domain), verify the public endpoint is reachable: + +```bash +curl -sk https://gateway.dstack.yourdomain.com:9202/ -o /dev/null -w '%{http_code}\n' +``` + +A `404` response confirms the gateway is accepting HTTPS connections from external clients. + +### Step 6: Switch to Production Certificates + +Now that the gateway is verified and working with staging certificates, switch to Let's Encrypt production to get browser-trusted certificates. This only needs to happen once per stable deployment. + +Update the ACME URL to production: + +```bash +ADMIN_ADDR="127.0.0.1:9203" + +curl -sf -X POST "http://$ADMIN_ADDR/prpc/SetCertbotConfig" \ + -H "Content-Type: application/json" \ + -d '{ + "acme_url": "https://acme-v02.api.letsencrypt.org/directory", + "renew_interval_secs": 3600, + "renew_before_expiration_secs": 864000, + "renew_timeout_secs": 300 + }' && echo "Certbot config set (PRODUCTION)" +``` + +After switching the ACME URL, the renewal loop may report "does not need renewal" because the staging cert is still valid. Force a renewal for each ZT domain to get production certificates immediately: + +```bash +SRV_DOMAIN=$(grep SRV_DOMAIN ~/gateway-deploy/.env | cut -d= -f2) + +# Force renewal for the base service domain +curl -sf -X POST "http://$ADMIN_ADDR/prpc/Admin.RenewZtDomainCert" \ + -H "Content-Type: application/json" \ + -d '{ + "domain": "'"$SRV_DOMAIN"'", + "force": true + }' && echo "Forced renewal: $SRV_DOMAIN" + +# Force renewal for the gateway subdomain +curl -sf -X POST "http://$ADMIN_ADDR/prpc/Admin.RenewZtDomainCert" \ + -H "Content-Type: application/json" \ + -d '{ + "domain": "gateway.'"$SRV_DOMAIN"'", + "force": true + }' && echo "Forced renewal: gateway.$SRV_DOMAIN" +``` + +Verify the certificates were issued by checking the admin API: + +```bash +curl -sf http://127.0.0.1:9203/prpc/ListZtDomains | jq '.domains[] | { + domain: .config.domain, + has_cert: .cert_status.has_cert, + loaded: .cert_status.loaded_in_memory, + issued: (.cert_status.issued_at | todate), + expires: (.cert_status.not_after | todate) +}' +``` + +Expected output shows both domains with `has_cert: true` and expiry dates ~90 days from issuance (Let's Encrypt's standard validity period): + +```json +{ + "domain": "dstack.yourdomain.com", + "has_cert": true, + "loaded": true, + "issued": "2026-03-08T22:44:09Z", + "expires": "2026-06-06T21:45:37Z" +} +``` + +> **Why not verify with `openssl s_client`?** The gateway has two TLS endpoints with different certificates. Port 9202 (RPC) always serves a TDX-attested `Dstack App CA` certificate for internal CVM-to-CVM communication. Port 9204 (HTTPS proxy) serves the Let's Encrypt certificates, but only when application traffic arrives for a registered app. With no apps deployed yet, the proxy port accepts TCP connections but doesn't present a certificate. Full TLS verification happens automatically when you deploy your [first application](/tutorial/hello-world-app). + +If the cert status shows `has_cert: false`, check the gateway logs for ACME errors: + +```bash +VM_ID=$(cd ~/dstack/vmm && ./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json | jq -r '.[] | select(.name=="dstack-gateway") | .id') +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=$VM_ID&follow=false&ansi=false&lines=50" | grep -i "cert\|renew\|acme" +``` + +> **In production deployments**, you deploy the gateway once and it requests a single production cert. Redeployments are rare and each only burns one rate-limited request — well within limits. The staging-first workflow is specifically for the initial setup phase where iterative testing is expected. + +### Step 7: Verify HAProxy Configuration + +If you followed the [HAProxy Setup](/tutorial/haproxy-setup) tutorial, your HAProxy configuration already includes the SNI routing rules needed for the gateway. Verify the configuration has the correct 3-rule SNI routing: + +```bash +grep -A2 "use_backend\|gateway" /etc/haproxy/haproxy.cfg +``` + +Your `https_front` frontend should have these three rules in order: + +1. **`vmm.dstack.yourdomain.com`** → `local_https_backend` (TLS termination → VMM on port 9080) +2. **`gateway.dstack.yourdomain.com`** → `gateway_rpc_passthrough` (TLS passthrough → port 9202, gateway RPC) +3. **`*.dstack.yourdomain.com`** → `gateway_passthrough` (TLS passthrough → port 9204, gateway HTTPS proxy) + +The `gateway_rpc_passthrough` rule is critical: when app CVMs use `--gateway-url https://gateway.dstack.yourdomain.com` (port 443), HAProxy forwards that traffic to the gateway RPC on port 9202. Without this rule, CVM registration fails because the traffic would hit the gateway proxy (port 9204) instead. + +If any rules are missing, update your HAProxy config per the [HAProxy Setup tutorial](/tutorial/haproxy-setup#step-4-create-haproxy-configuration), then reload: + +```bash +sudo haproxy -c -f /etc/haproxy/haproxy.cfg && sudo systemctl reload haproxy +``` + +--- + +## CVM Management + +### Common VMM Commands + +Navigate to the VMM directory first: + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) +``` + +| Action | Command | +|--------|---------| +| List VMs | `./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm` | +| View logs | `curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=100"` | +| Follow logs | `curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" "http://127.0.0.1:9080/logs?id=VM_ID&follow=true&ansi=false"` | +| Remove VM | `./src/vmm-cli.py --url http://127.0.0.1:9080 remove VM_ID` | + +> **Note:** Replace `VM_ID` with the actual VM ID from `lsvm`. + +### Redeploying + +> **Certificate impact:** Destroying a CVM wipes its WaveKV store, which contains cached Let's Encrypt certificates. The next deployment will trigger a fresh ACME certificate request. If you're doing iterative redeployments during testing, use the staging ACME URL in [Step 4a](#4a-set-certbot-configuration) to avoid hitting production rate limits (10 certs / 3 hours / IP). See [Troubleshooting: Let's Encrypt rate limits](/tutorial/troubleshooting-gateway-deployment#lets-encrypt-rate-limits) for details. + +To redeploy the gateway (e.g., after configuration changes): + +1. Remove the existing CVM: + ```bash + VM_ID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json | jq -r '.[] | select(.name=="dstack-gateway") | .id') + ./src/vmm-cli.py --url http://127.0.0.1:9080 remove "$VM_ID" + ``` +2. If you changed docker-compose.yaml or .app_env, regenerate app-compose.json and whitelist the new hash (see Preparation tutorial Steps 7-8) +3. Re-run the deploy command (Step 2 above) +4. Re-run the admin API bootstrap (Step 4 above) — use staging ACME if still iterating, or production if this is a final deployment + +--- + +## Architecture + +### Request Flow + +``` +Client HTTPS Request (*.dstack.yourdomain.com) + │ + ▼ +┌──────────────────────────────────────────────────┐ +│ HAProxy (:443) │ +│ SNI: *.dstack.yourdomain.com │ +│ TCP passthrough → 127.0.0.1:9204 │ +└──────────────────┬───────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────┐ +│ Gateway CVM │ +│ ┌────────────────────────────────────────────┐ │ +│ │ HTTPS Proxy (:443 inside CVM) │ │ +│ │ 1. TLS Termination (Let's Encrypt cert) │ │ +│ │ 2. Domain Parsing (app-id.domain.com) │ │ +│ │ 3. CVM Lookup │ │ +│ │ 4. Forward via WireGuard │ │ +│ └────────────────────┬───────────────────────┘ │ +│ │ │ +│ ┌────────────────────▼───────────────────────┐ │ +│ │ WireGuard (wg-ds-gw :51820) │ │ +│ │ 10.240.0.0/16 subnet │ │ +│ └────────────────────┬───────────────────────┘ │ +└───────────────────────┼──────────────────────────┘ + │ + ┌───────────┼───────────┐ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │ App CVM │ │ App CVM │ │ App CVM │ + │10.240.0.2│ │10.240.0.3│ │10.240.0.4│ + └──────────┘ └──────────┘ └──────────┘ +``` + +--- + +## Troubleshooting + +For detailed solutions, see the [Gateway Deployment Troubleshooting Guide](/tutorial/troubleshooting-gateway-deployment#gateway-cvm-deployment-issues): + +- ["Port mapping is not allowed for udp:9202"](/tutorial/troubleshooting-gateway-deployment#port-mapping-is-not-allowed-for-udp9202) +- ["OS image is not allowed"](/tutorial/troubleshooting-gateway-deployment#os-image-is-not-allowed) +- [CVM fails to start](/tutorial/troubleshooting-gateway-deployment#cvm-fails-to-start) +- [CVM exits immediately or reboots in a loop](/tutorial/troubleshooting-gateway-deployment#cvm-exits-immediately-or-reboots-in-a-loop) +- [Compose hash not allowed](/tutorial/troubleshooting-gateway-deployment#compose-hash-not-allowed) +- [Admin API unreachable](/tutorial/troubleshooting-gateway-deployment#admin-api-unreachable) +- [Let's Encrypt rate limits](/tutorial/troubleshooting-gateway-deployment#lets-encrypt-rate-limits) +- [Certbot fails to issue certificates](/tutorial/troubleshooting-gateway-deployment#certbot-fails-to-issue-certificates) +- [KMS connectivity issues](/tutorial/troubleshooting-gateway-deployment#kms-connectivity-issues) +- [WireGuard endpoint unreachable from app CVMs](/tutorial/troubleshooting-gateway-deployment#wireguard-endpoint-unreachable-from-app-cvms) + +--- + +## Phase Complete + +Congratulations! You have completed Gateway Deployment: + +1. **Gateway CVM Preparation** — Docker compose, environment configuration, on-chain registration +2. **Gateway CVM Deployment** — CVM deployment, admin API bootstrap, verification + +Your dstack infrastructure now has: +- **KMS CVM** — Key management with TDX attestation (port 9100) +- **Gateway CVM** — Reverse proxy with WireGuard tunnels and auto-TLS (RPC: 9202, HTTPS: 9204) +- **VMM** — Virtual machine manager (port 9080) +- **HAProxy** — External traffic routing (ports 80, 443) + +## Next Steps + +With the gateway running, you're ready to deploy your first application to a CVM: + +- Deploy a Hello World application through the gateway +- Verify end-to-end TLS with automatic certificate provisioning +- Test WireGuard tunnel connectivity from an app CVM to the gateway diff --git a/docs/tutorials/gramine-key-provider.md b/docs/tutorials/gramine-key-provider.md new file mode 100644 index 00000000..89f2547b --- /dev/null +++ b/docs/tutorials/gramine-key-provider.md @@ -0,0 +1,342 @@ +--- +title: "Gramine Key Provider" +description: "Deploy SGX-based Gramine Sealing Key Provider for CVM attestation" +section: "Prerequisites" +stepNumber: 4 +totalSteps: 7 +lastUpdated: 2026-01-09 +prerequisites: + - docker-setup +tags: + - gramine + - sgx + - attestation + - key-provider + - prerequisites +difficulty: advanced +estimatedTime: "30 minutes" +--- + +# Gramine Key Provider + +This tutorial guides you through deploying the Gramine Sealing Key Provider, an SGX-based service that solves the "chicken-and-egg" problem in CVM deployment. The key provider runs on the host and provides attestation-backed sealing keys to CVMs during boot. + +## Why You Need This + +When deploying a dstack CVM (like the KMS), there's a fundamental bootstrapping problem: + +| The Problem | Why It Matters | +|-------------|----------------| +| CVMs need sealing keys to boot | Keys protect secrets inside the CVM | +| KMS is the service that provides keys | But KMS itself is a CVM that needs keys | +| **Chicken-and-egg:** KMS needs keys, but KMS provides keys | Deployment deadlock | + +**The Solution:** The Gramine Sealing Key Provider runs on the **host** using Intel SGX (not TDX). It can provide attestation-backed sealing keys to CVMs during their initial boot. Once the KMS CVM boots successfully, it takes over key management for subsequent deployments. + +## How It Works + +``` +┌─────────────────────────────────────────────────────────────┐ +│ TDX Host │ +│ │ +│ ┌──────────────────────────────────────┐ │ +│ │ Gramine Sealing Key Provider │ │ +│ │ (SGX Enclave) │ │ +│ │ │ │ +│ │ - Runs in Intel SGX enclave │ │ +│ │ - Listens on 0.0.0.0:3443 │ │ +│ │ - Provides sealing keys via HTTPS │ │ +│ │ - Verifies TDX quotes from CVMs │ │ +│ └──────────────────────┬───────────────┘ │ +│ │ │ +│ ▼ (provides keys) │ +│ ┌──────────────────────────────────────┐ │ +│ │ KMS CVM (TDX) │ │ +│ │ │ │ +│ │ - Boots with sealing key │ │ +│ │ - Generates TDX attestation quote │ │ +│ │ - Takes over key management │ │ +│ └──────────────────────────────────────┘ │ +│ │ +└──────────────────────────────────────────────────────────────┘ +``` + +**Key Points:** +- Gramine runs in an SGX enclave (not a TDX CVM) +- Only provides keys to verified TDX CVMs +- Uses PPID (Platform Provisioning ID) verification +- Temporary solution until KMS CVM is running + +## Prerequisites + +Before starting, ensure you have: + +- Completed [TDX & SGX Verification](/tutorial/tdx-sgx-verification) - SGX devices must be present +- Docker installed and running +- SGX devices accessible: `/dev/sgx_enclave`, `/dev/sgx_provision` + +### Verify SGX Devices + +```bash +ls -la /dev/sgx* +``` + +Expected output: +``` +crw------- 1 root root 10, 125 Dec 8 00:00 /dev/sgx_enclave +crw------- 1 root root 10, 126 Dec 8 00:00 /dev/sgx_provision +crw------- 1 root root 10, 124 Dec 8 00:00 /dev/sgx_vepc +``` + +If these devices are missing, complete the [TDX BIOS Configuration](/tutorial/tdx-bios-configuration) tutorial first. + +--- + + +## Manual Deployment + +If you prefer to deploy manually, follow these steps. + +### Step 1: Clone dstack Repository + +Clone the dstack repository and check out the v0.5.7 release: + +```bash +cd ~ +git clone https://github.com/Dstack-TEE/dstack.git +cd dstack +git checkout v0.5.7 +``` + +### Step 2: Navigate to Key Provider + +```bash +cd ~/dstack/key-provider-build +ls -la +``` + +You should see: +- `docker-compose.yml` - Container orchestration +- `Dockerfile.aesmd` (or similar) - SGX AESM daemon image +- `Dockerfile.gramine` (or similar) - Gramine key provider image + +### Step 3: Create QCNL Configuration + +The key provider needs to know where to find a PCCS for quote verification. Create the QCNL configuration file: + +```bash +cat > ~/dstack/key-provider-build/sgx_default_qcnl.conf << 'EOF' +{ + "pccs_url": "https://pccs.phala.network/sgx/certification/v4/", + "use_secure_cert": false, + "retry_times": 6, + "retry_delay": 10 +} +EOF +``` + +This configures the key provider to use Phala Network's public PCCS for attestation verification. + +### Step 4: Configure Network Binding for CVM Access + +The default configuration binds to localhost, but CVMs need to access the key provider via the host's network. Update the port binding: + +```bash +# Change from 127.0.0.1:3443 to 0.0.0.0:3443 +sed -i 's/"127\.0\.0\.1:3443:3443"/"0.0.0.0:3443:3443"/' ~/dstack/key-provider-build/docker-compose.yaml +``` + +> **Note:** This makes the key provider accessible from CVMs via the QEMU user-mode networking gateway (`10.0.2.2`). The key provider still verifies TDX quotes, so only legitimate CVMs can obtain keys. + +### Step 5: Build Docker Images + +```bash +docker compose build +``` + +This builds two images: +1. **aesmd** - Intel SGX Architectural Enclave Service Manager +2. **gramine-sealing-key-provider** - The actual key provider + +### Step 6: Start Services + +```bash +docker compose up -d +``` + +This starts: +- **aesmd container** - Provides SGX enclave services +- **gramine-sealing-key-provider container** - Key provider on port 3443 + +### Step 7: Verify Services Running + +Check container status: + +```bash +docker ps | grep -E "(aesmd|gramine)" +``` + +Expected output shows both containers running: +``` +abc123 aesmd Up 2 minutes +def456 gramine-sealing-key-provider Up 2 minutes +``` + +Check aesmd logs: + +```bash +docker logs aesmd 2>&1 | tail -20 +``` + +Look for successful initialization messages. + +Check key provider logs: + +```bash +docker logs gramine-sealing-key-provider 2>&1 | tail -20 +``` + +Look for messages indicating the enclave is ready and listening. + +--- + +## Verification + +### Check Port Binding + +```bash +sudo ss -tlnp | grep 3443 +``` + +Expected: +``` +LISTEN 0 4096 0.0.0.0:3443 0.0.0.0:* users:(("node",pid=12345,fd=7)) +``` + +> **Note:** The `-p` flag requires sudo to show process information. + +> **Note:** The service binds to `0.0.0.0` to allow access from CVMs via QEMU's user-mode networking (`10.0.2.2` from the CVM's perspective). + +### Check SGX Enclave Status + +The key provider should show SGX enclave initialization in its logs: + +```bash +docker logs gramine-sealing-key-provider 2>&1 | grep -i "enclave\|sgx\|quote" +``` + +Look for messages like: +- `SGX enclave initialized` +- `Quote provider ready` +- `Listening on 0.0.0.0:3443` + +### Test Key Provider Endpoint + +The key provider uses HTTPS with a self-signed certificate. Test connectivity: + +```bash +curl -sk https://127.0.0.1:3443/ +``` + +An empty response or a brief error message indicates the service is running - the TLS handshake succeeded. The key provider doesn't serve a root endpoint; it only responds to specific API calls from CVMs. + +If you get `curl: (7) Failed to connect` or similar connection error, the service is not running. + +--- + +## How CVMs Use the Key Provider + +When deploying a CVM with `--local-key-provider` flag, the VMM: + +1. CVM boots and needs sealing key +2. CVM generates TDX attestation quote +3. Quote is sent to Gramine Key Provider (127.0.0.1:3443) +4. Key Provider verifies quote authenticity +5. Key Provider returns sealing key to CVM +6. CVM uses key to decrypt/protect secrets + +This happens automatically - you don't need to configure anything in the CVM. + +--- + +## Architecture Details + +### Container Configuration + +```yaml +services: + aesmd: + # Intel SGX AESM daemon + # Provides enclave management services + devices: + - /dev/sgx_enclave:/dev/sgx/enclave + - /dev/sgx_provision:/dev/sgx/provision + volumes: + - /var/run/aesmd:/var/run/aesmd + + gramine-sealing-key-provider: + # Gramine-based key provider + # Runs inside SGX enclave + depends_on: + - aesmd + ports: + - "0.0.0.0:3443:3443" # Accessible from CVMs via 10.0.2.2 + devices: + - /dev/sgx_enclave:/dev/sgx/enclave + - /dev/sgx_provision:/dev/sgx/provision + volumes: + - /var/run/aesmd:/var/run/aesmd +``` + +### Security Considerations + +| Aspect | Implementation | +|--------|----------------| +| Network binding | `0.0.0.0:3443` - accessible from CVMs via `10.0.2.2` | +| Quote verification | Validates TDX quotes before providing keys | +| Enclave protection | Keys never leave SGX enclave in plaintext | +| PPID verification | Ensures keys only go to legitimate CVMs | + +> **Why `0.0.0.0`?** CVMs use QEMU's user-mode networking where the host appears as `10.0.2.2`. Binding to localhost would prevent CVMs from reaching the key provider. Security is maintained through TDX quote verification - only legitimate CVMs with valid attestation can obtain keys. + +--- + +## Troubleshooting + +For detailed solutions, see the [Prerequisites Troubleshooting Guide](/tutorial/troubleshooting-prerequisites#gramine-key-provider-issues): + +- [Container fails to start: SGX devices not found](/tutorial/troubleshooting-prerequisites#container-fails-to-start-sgx-devices-not-found) +- [Error: AESM service not ready](/tutorial/troubleshooting-prerequisites#error-aesm-service-not-ready) +- [Quote verification failures](/tutorial/troubleshooting-prerequisites#quote-verification-failures) +- [Empty response from curl test](/tutorial/troubleshooting-prerequisites#empty-response-from-curl-test) +- [Port 3443 already in use](/tutorial/troubleshooting-prerequisites#port-3443-already-in-use) +- [SGX enclave initialization timeout](/tutorial/troubleshooting-prerequisites#sgx-enclave-initialization-timeout) + +--- + +## Verification Summary + +Run this verification script: + +```bash +echo "AESMD Container: $(docker ps --format '{{.Names}}' | grep -q aesmd && echo 'running' || echo 'not running')" +echo "Key Provider Container: $(docker ps --format '{{.Names}}' | grep -q gramine-sealing-key-provider && echo 'running' || echo 'not running')" +echo "Port 3443: $(ss -tln | grep -q :3443 && echo 'listening' || echo 'not listening')" +echo "SGX Devices: $([ -e /dev/sgx_enclave ] && [ -e /dev/sgx_provision ] && echo 'present' || echo 'missing')" +``` + +All checks should show positive status (running, listening, present). + +--- + +## Next Steps + +With the Gramine Key Provider running, proceed to: + +- [Local Docker Registry](/tutorial/local-docker-registry) - Set up registry for CVM images + +## Additional Resources + +- [Gramine Documentation](https://gramine.readthedocs.io/) +- [Intel SGX Developer Guide](https://download.01.org/intel-sgx/sgx-dcap/1.14/linux/docs/) +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) diff --git a/docs/tutorials/guest-image-setup.md b/docs/tutorials/guest-image-setup.md new file mode 100644 index 00000000..f53d3654 --- /dev/null +++ b/docs/tutorials/guest-image-setup.md @@ -0,0 +1,337 @@ +--- +title: "Guest OS Image Setup" +description: "Download and configure guest OS images for dstack CVM deployment" +section: "dstack Installation" +stepNumber: 7 +totalSteps: 8 +lastUpdated: 2025-01-21 +prerequisites: + - vmm-service-setup + - management-interface-setup +tags: + - dstack + - cvm + - guest-os + - vmm + - image +difficulty: "intermediate" +estimatedTime: "30 minutes" +--- + +# Guest OS Image Setup + +This tutorial guides you through setting up guest OS images for deploying Confidential Virtual Machines (CVMs) on your dstack infrastructure. Guest images contain the operating system, kernel, and firmware that run inside the TDX-protected environment. + +## What You'll Configure + +- **Guest OS images** - Pre-built Yocto-based images for CVMs +- **VMM image directory** - Proper organization for multiple image versions +- **Image verification** - Confirm VMM can access the images + +## Understanding Guest OS Images + +A dstack guest OS image consists of four core components: + +| Component | Description | +|-----------|-------------| +| **OVMF.fd** | Virtual firmware (UEFI BIOS) - boots first, establishes TDX measurements | +| **bzImage** | Linux kernel compiled for TDX guests | +| **initramfs.cpio.gz** | Initial RAM filesystem with early boot scripts | +| **rootfs.cpio** | Root filesystem containing tappd and container runtime | + +These components are measured by TDX hardware during boot, creating a cryptographic chain of trust that can be verified through attestation. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [VMM Service Setup](/tutorial/vmm-service-setup) +- VMM service running (with web interface at http://localhost:9080) +- At least 10GB free disk space for images + + +## Manual Setup + +If you prefer to set up guest images manually, follow these steps. + +### Step 1: Create Image Directory Structure + +Create the directory where guest images will be stored: + +```bash +sudo mkdir -p /var/lib/dstack/images +sudo chown root:root /var/lib/dstack/images +sudo chmod 755 /var/lib/dstack/images +``` + +### Step 2: Download Guest OS Image + +Download the dstack guest OS image matching your installed VMM version: + +```bash +# Get version from installed VMM +DSTACK_VERSION=$(dstack-vmm --version | grep -oP 'v\K[0-9]+\.[0-9]+\.[0-9]+') +echo "Installing guest images for version: $DSTACK_VERSION" + +# Download the image archive +cd /tmp +wget https://github.com/Dstack-TEE/meta-dstack/releases/download/v${DSTACK_VERSION}/dstack-${DSTACK_VERSION}.tar.gz +``` + +Verify the download: + +```bash +ls -lh dstack-${DSTACK_VERSION}.tar.gz +``` + +Expected output (size varies by version): + +``` +-rw-r--r-- 1 root root 150M Dec 2 10:00 dstack-0.5.7.tar.gz +``` + +### Step 3: Extract and Install Image + +Extract the image archive (the tarball contains a `dstack-X.Y.Z/` directory): + +```bash +# Extract image components (tarball includes versioned directory) +sudo tar -xvf dstack-${DSTACK_VERSION}.tar.gz -C /var/lib/dstack/images/ +``` + +Verify the extracted files: + +```bash +ls -la /var/lib/dstack/images/dstack-${DSTACK_VERSION}/ +``` + +Expected output: + +``` +total 156000 +drwxr-xr-x 2 root root 4096 Dec 2 10:05 . +drwxr-xr-x 3 root root 4096 Dec 2 10:05 .. +-rw-r--r-- 1 root root 4194304 Dec 2 10:05 OVMF.fd +-rw-r--r-- 1 root root 12345678 Dec 2 10:05 bzImage +-rw-r--r-- 1 root root 45678901 Dec 2 10:05 initramfs.cpio.gz +-rw-r--r-- 1 root root 98765432 Dec 2 10:05 rootfs.cpio +-rw-r--r-- 1 root root 512 Dec 2 10:05 metadata.json +``` + +### Step 4: Verify Image Metadata + +Check the image metadata to understand its configuration: + +```bash +cat /var/lib/dstack/images/dstack-${DSTACK_VERSION}/metadata.json | jq . +``` + +Expected output: + +```json +{ + "version": "dstack-0.5.7", + "cmdline": "console=hvc0 root=/dev/vda ro rootfstype=squashfs rootflags=loop ...", + "kernel": "bzImage", + "initrd": "initramfs.cpio.gz", + "rootfs": "rootfs.cpio", + "bios": "OVMF.fd", + "rootfs_hash": "sha256:abc123...", + "is_dev": false +} +``` + +### Metadata Fields Explained + +| Field | Description | +|-------|-------------| +| `version` | Image version identifier | +| `cmdline` | Kernel boot parameters including rootfs hash | +| `kernel` | Kernel image filename | +| `initrd` | Initial ramdisk filename | +| `rootfs` | Root filesystem filename | +| `bios` | UEFI firmware filename | +| `rootfs_hash` | Cryptographic hash of rootfs for verification | +| `is_dev` | Whether this is a development image (allows SSH) | + +### Step 5: Verify VMM Can Access Images + +The VMM service should already be running from the earlier setup. Verify it can see the installed images. + +### Check VMM Service Status + +```bash +sudo systemctl status dstack-vmm +``` + +The service should be active and running. + +### Verify Images via VMM Web Interface + +Open the VMM Management Console in your browser (configured in [Management Interface Setup](/tutorial/management-interface-setup)): + +``` +https://vmm.dstack.yourdomain.com +``` + +You should see the installed guest images listed in the interface. + +### Verify VMM is Responding + +First, verify the VMM web interface is accessible: + +```bash +curl -s http://127.0.0.1:9080/ | head -5 +``` + +You should see HTML content from the VMM management interface. + +### Verify Images on Disk + +Check that image files are present: + +```bash +ls -la /var/lib/dstack/images/dstack-*/ +``` + +You should see OVMF.fd, bzImage, initramfs.cpio.gz, rootfs.cpio, and metadata.json. + +### Verify Images on Filesystem + +List installed images directly: + +```bash +ls /var/lib/dstack/images/ +``` + +Expected output: + +``` +dstack-0.5.7 +``` + +Verify image contents: + +```bash +ls /var/lib/dstack/images/dstack-*/ +``` + +Each image directory should contain: OVMF.fd, bzImage, initramfs.cpio.gz, rootfs.cpio, and metadata.json. + +### Step 6: Verify VMM Configuration + +Ensure VMM is configured to use the correct image path. Check the configuration: + +```bash +cat /etc/dstack/vmm.toml | grep -A5 "image" +``` + +The `image_path` should point to `/var/lib/dstack/images`. + +If VMM isn't finding the images, verify the path in the configuration matches where you installed them. + +## Managing Multiple Image Versions + +You can have multiple image versions installed simultaneously: + +```bash +# Download additional version +DSTACK_VERSION="0.5.3" +wget https://github.com/Dstack-TEE/meta-dstack/releases/download/v${DSTACK_VERSION}/dstack-${DSTACK_VERSION}.tar.gz + +# Extract to images directory (tarball already contains dstack-X.Y.Z/ folder) +sudo tar -xvf dstack-${DSTACK_VERSION}.tar.gz -C /var/lib/dstack/images/ + +# Restart VMM to pick up the new image +sudo systemctl restart dstack-vmm +``` + +> **Important:** VMM must be restarted after adding new images for them to appear in the management interface. + +List all installed images: + +```bash +ls -la /var/lib/dstack/images/ +``` + +Or list them on the filesystem: + +```bash +ls /var/lib/dstack/images/ +``` + +When deploying applications, specify which image version to use in the docker-compose.yml. + +## Troubleshooting + +For detailed solutions, see the [dstack Installation Troubleshooting Guide](/tutorial/troubleshooting-dstack-installation#guest-image-setup-issues): + +- [Images not appearing in VMM](/tutorial/troubleshooting-dstack-installation#images-not-appearing-in-vmm) +- [Image download fails](/tutorial/troubleshooting-dstack-installation#image-download-fails) +- [Image metadata missing](/tutorial/troubleshooting-dstack-installation#image-metadata-missing) +- [VMM service not running](/tutorial/troubleshooting-dstack-installation#vmm-service-not-running) + +## Verification Checklist + +Before proceeding, verify you have: + +- [ ] Created image directory structure +- [ ] Downloaded guest OS image +- [ ] Extracted image components (OVMF.fd, bzImage, initramfs, rootfs) +- [ ] Verified metadata.json exists and is valid +- [ ] Confirmed VMM service is running +- [ ] Verified VMM web interface is accessible + +### Quick verification script + +```bash +echo "Image Directory: $([ -d /var/lib/dstack/images ] && echo 'exists' || echo 'missing')" +echo "Guest Images: $(ls -d /var/lib/dstack/images/dstack-* 2>/dev/null | wc -l) found" +echo "VMM Service: $(sudo systemctl is-active dstack-vmm)" +echo "VMM Web UI: $(curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:9080/ 2>/dev/null || echo 'unreachable')" +echo "Image files:" +ls /var/lib/dstack/images/dstack-*/metadata.json 2>/dev/null || echo " No images found" +``` + +Image directory should exist with at least one guest image, VMM service should be active, and VMM web UI should return HTTP 200. + +## Understanding the Boot Process + +When a CVM starts, the following sequence occurs: + +``` +1. VMM launches QEMU with TDX enabled + ↓ +2. OVMF (Virtual Firmware) boots + - Measures itself into MRTD + - Initializes virtual hardware + ↓ +3. Linux Kernel loads + - Measured into RTMR1 + - Kernel cmdline measured into RTMR2 + ↓ +4. Initramfs runs + - Measured into RTMR2 + - Mounts rootfs + ↓ +5. Tappd starts + - Guest daemon for attestation + - Provides /var/run/tappd.sock + ↓ +6. Docker containers start + - Application workloads + - Can request TDX quotes via tappd +``` + +Each step creates cryptographic measurements that can be verified through TDX attestation. + +## Next Steps + +With guest images configured and VMM able to access them, you're ready to deploy your first application. The next tutorial covers deploying a Hello World application to verify your setup works correctly. + +## Additional Resources + +- [meta-dstack Repository](https://github.com/Dstack-TEE/meta-dstack) +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) +- [Yocto Project](https://www.yoctoproject.org/) +- [TDX Guest Architecture](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html) diff --git a/docs/tutorials/haproxy-setup.md b/docs/tutorials/haproxy-setup.md new file mode 100644 index 00000000..dddcf9a6 --- /dev/null +++ b/docs/tutorials/haproxy-setup.md @@ -0,0 +1,452 @@ +--- +title: "HAProxy Setup" +description: "Install and configure HAProxy as the unified TLS entry point for dstack services" +section: "Prerequisites" +stepNumber: 3 +totalSteps: 7 +lastUpdated: 2026-01-22 +prerequisites: + - ssl-certificate-setup +tags: + - haproxy + - tls + - proxy + - prerequisites +difficulty: intermediate +estimatedTime: "15 minutes" +--- + +# HAProxy Setup + +This tutorial guides you through installing and configuring HAProxy as the unified TLS entry point for all dstack services. HAProxy provides a critical capability: mixed-mode TLS handling that can terminate TLS for some backends while passing through encrypted traffic for others. + +## Why HAProxy? + +| Capability | Description | +|------------|-------------| +| **SNI-based routing** | Route requests based on domain without decrypting | +| **TLS termination** | Handle HTTPS for services without native TLS | +| **TLS passthrough** | Forward encrypted traffic to services with native TLS | +| **Mixed mode** | Both modes on the same port (443) | + +The dstack gateway has native TLS passthrough capability (the `*s.` subdomain pattern). HAProxy preserves this by forwarding encrypted traffic directly to the gateway, while terminating TLS for other services like the Docker registry. + +## Architecture Overview + +``` + Internet + │ + ▼ + ┌─────────────────┐ + │ HAProxy :443 │ + │ :80 │ + └────────┬────────┘ + │ + ┌────────────────┼────────────────┐ + │ │ │ + ┌────────▼───────┐ ┌──────▼──────┐ ┌──────▼──────┐ + │ TLS Terminate │ │TLS Terminate│ │TLS Passthru │ + │ registry.* │ │ vmm.* │ │ *.dstack.* │ + └────────┬───────┘ └──────┬──────┘ └──────┬──────┘ + │ │ │ + ▼ ▼ ▼ + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ + │ Registry │ │ VMM API │ │ Gateway │ + │ localhost:5000│ │ localhost:9080│ │ localhost:9204│ + └───────────────┘ └───────────────┘ └───────────────┘ +``` + +## Prerequisites + +Before starting, ensure you have: + +- Completed [SSL Certificate Setup](/tutorial/ssl-certificate-setup) - Certificates obtained +- SSH access to your TDX server +- Root or sudo privileges + + +## Manual Setup + +If you prefer to configure manually, follow these steps. + +### Step 1: Install HAProxy + +```bash +sudo apt update +sudo apt install -y haproxy +``` + +Verify installation: + +```bash +haproxy -v +``` + +### Step 2: Create Certificate Directory + +HAProxy requires certificates in a combined format (cert + key in one file): + +```bash +sudo mkdir -p /etc/haproxy/certs +``` + +### Step 3: Prepare Certificates + +Combine Let's Encrypt certificates into HAProxy format: + +```bash +# Registry certificate +sudo cat /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/registry.yourdomain.com/privkey.pem \ + | sudo tee /etc/haproxy/certs/registry.pem > /dev/null + +# Wildcard certificate (for *.dstack.yourdomain.com) +sudo cat /etc/letsencrypt/live/dstack.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/dstack.yourdomain.com/privkey.pem \ + | sudo tee /etc/haproxy/certs/wildcard.pem > /dev/null + +# Secure the certificates +sudo chmod 600 /etc/haproxy/certs/*.pem +``` + +### Step 4: Create HAProxy Configuration + +```bash +sudo tee /etc/haproxy/haproxy.cfg > /dev/null <<'EOF' +# HAProxy Configuration for dstack Services +# Provides SNI-based routing with mixed TLS termination/passthrough + +global + log /dev/log local0 + chroot /var/lib/haproxy + stats socket /run/haproxy/admin.sock mode 660 level admin + stats timeout 30s + user haproxy + group haproxy + daemon + + # Modern TLS settings + ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 + ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets + +defaults + log global + option dontlognull + timeout connect 5000 + timeout client 50000 + timeout server 50000 + errorfile 400 /etc/haproxy/errors/400.http + errorfile 403 /etc/haproxy/errors/403.http + errorfile 408 /etc/haproxy/errors/408.http + errorfile 500 /etc/haproxy/errors/500.http + errorfile 502 /etc/haproxy/errors/502.http + errorfile 503 /etc/haproxy/errors/503.http + errorfile 504 /etc/haproxy/errors/504.http + +# ============================================================================= +# FRONTEND: HTTP (port 80) - Redirect to HTTPS +# ============================================================================= +frontend http_front + bind *:80 + mode http + option httplog + + # Redirect all HTTP to HTTPS + http-request redirect scheme https code 301 + +# ============================================================================= +# FRONTEND: HTTPS (port 443) - SNI-based routing +# ============================================================================= +frontend https_front + bind *:443 + mode tcp + option tcplog + + # Inspect SNI for routing decisions + tcp-request inspect-delay 5s + tcp-request content accept if { req_ssl_hello_type 1 } + + # TLS Termination: VMM management interface (must be before gateway rules) + use_backend local_https_backend if { req_ssl_sni -i vmm.dstack.yourdomain.com } + + # TLS Passthrough: Gateway RPC (CVM registration uses port 443 via --gateway-url) + use_backend gateway_rpc_passthrough if { req_ssl_sni -i gateway.dstack.yourdomain.com } + + # TLS Passthrough: Gateway proxy handles all other *.dstack.* subdomains (app traffic) + use_backend gateway_passthrough if { req_ssl_sni -m end .dstack.yourdomain.com } + + # TLS Termination: Everything else goes to local termination frontend + default_backend local_https_backend + +# ============================================================================= +# BACKEND: Gateway RPC TLS Passthrough +# When app CVMs use --gateway-url https://gateway.dstack.yourdomain.com (port 443), +# HAProxy must forward that traffic to the gateway RPC port (9202) so CVM +# registration works without requiring clients to specify port 9202 directly. +# ============================================================================= +backend gateway_rpc_passthrough + mode tcp + option tcp-check + server gateway-rpc 127.0.0.1:9202 check + +# ============================================================================= +# BACKEND: Gateway Proxy TLS Passthrough (app traffic) +# ============================================================================= +backend gateway_passthrough + mode tcp + option tcp-check + server gateway 127.0.0.1:9204 check + +# ============================================================================= +# BACKEND: Route to TLS Termination Frontend +# ============================================================================= +backend local_https_backend + mode tcp + server loopback 127.0.0.1:8444 send-proxy + +# ============================================================================= +# FRONTEND: TLS Termination (internal) +# ============================================================================= +frontend https_terminate + bind 127.0.0.1:8444 ssl crt /etc/haproxy/certs/ accept-proxy + mode http + option httplog + + # Route based on Host header after TLS termination + use_backend registry_backend if { hdr(host) -i registry.yourdomain.com } + use_backend vmm_backend if { hdr(host) -m end .dstack.yourdomain.com } + + # Default backend + default_backend vmm_backend + +# ============================================================================= +# HTTP BACKENDS +# ============================================================================= +backend registry_backend + mode http + option httpchk GET /v2/ + http-check expect status 200 + http-request set-header X-Forwarded-Proto https + server registry 127.0.0.1:5000 check + +backend vmm_backend + mode http + option httpchk GET / + http-request set-header X-Forwarded-Proto https + server vmm 127.0.0.1:9080 check + +# ============================================================================= +# STATS (localhost only) +# ============================================================================= +listen stats + bind 127.0.0.1:8404 + mode http + stats enable + stats uri /stats + stats refresh 10s +EOF +``` + +**Update `yourdomain.com`** throughout the configuration to your actual domain. + +### Step 5: Update Domain in Configuration + +```bash +# Replace placeholder with your actual domain +sudo sed -i 's/yourdomain\.com/YOUR_ACTUAL_DOMAIN/g' /etc/haproxy/haproxy.cfg +``` + +### Step 6: Test Configuration + +```bash +sudo haproxy -c -f /etc/haproxy/haproxy.cfg +``` + +Expected output: + +``` +Configuration file is valid +``` + +### Step 7: Enable and Start HAProxy + +```bash +sudo systemctl enable haproxy +sudo systemctl restart haproxy +``` + +### Step 8: Verify HAProxy is Running + +```bash +sudo systemctl status haproxy +``` + +Check HAProxy is listening: + +```bash +sudo ss -tlnp | grep haproxy +``` + +Expected output shows ports 80, 443, 8444, and 8404. + +--- + +## Certificate Renewal Hook + +When Let's Encrypt renews certificates, HAProxy needs to reload them. + +### Create Renewal Hook + +```bash +sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-haproxy.sh > /dev/null <<'EOF' +#!/bin/bash +# Reload HAProxy certificates after Let's Encrypt renewal + +# Combine certificates for HAProxy +cat /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/registry.yourdomain.com/privkey.pem \ + > /etc/haproxy/certs/registry.pem + +cat /etc/letsencrypt/live/dstack.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/dstack.yourdomain.com/privkey.pem \ + > /etc/haproxy/certs/wildcard.pem + +chmod 600 /etc/haproxy/certs/*.pem + +# Reload HAProxy +systemctl reload haproxy + +echo "HAProxy certificates updated: $(date)" +EOF + +sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-haproxy.sh +``` + +**Update the domain names** in the script to match your certificates. + +### Test Renewal Hook + +```bash +sudo /etc/letsencrypt/renewal-hooks/deploy/reload-haproxy.sh +``` + +--- + +## Configuration Reference + +### Directory Structure + +``` +/etc/haproxy/ +├── haproxy.cfg # Main configuration +├── certs/ +│ ├── registry.pem # Registry cert+key combined +│ └── wildcard.pem # Wildcard cert+key combined +└── errors/ # Error pages +``` + +### Service Commands + +| Command | Description | +|---------|-------------| +| `sudo systemctl start haproxy` | Start HAProxy | +| `sudo systemctl stop haproxy` | Stop HAProxy | +| `sudo systemctl restart haproxy` | Restart HAProxy | +| `sudo systemctl reload haproxy` | Reload config without dropping connections | +| `sudo haproxy -c -f /etc/haproxy/haproxy.cfg` | Test configuration syntax | + +### View Logs + +```bash +# Follow HAProxy logs +sudo journalctl -u haproxy -f + +# Check syslog for HAProxy entries +sudo tail -f /var/log/syslog | grep haproxy +``` + +### Stats Page + +HAProxy provides a stats page on `127.0.0.1:8404`: + +```bash +curl http://127.0.0.1:8404/stats +``` + +Or open in browser via SSH tunnel: + +```bash +ssh -L 8404:127.0.0.1:8404 user@your-server +# Then open http://localhost:8404/stats in browser +``` + +--- + +## How SNI Routing Works + +HAProxy inspects the TLS ClientHello message to read the SNI (Server Name Indication) field without decrypting the traffic: + +``` +Client Request: https://app123s.dstack.example.com + │ + ▼ +HAProxy sees SNI = "app123s.dstack.example.com" + │ + ▼ (matches .dstack.example.com pattern) + │ +TCP Passthrough to gateway:9204 + │ + ▼ +Gateway receives original TLS handshake + │ + ▼ (gateway sees "s" suffix = passthrough mode) + │ +Gateway passes encrypted stream to CVM:443 +``` + +For TLS-terminated services: + +``` +Client Request: https://registry.example.com + │ + ▼ +HAProxy sees SNI = "registry.example.com" + │ + ▼ (no .dstack. pattern match, goes to default) + │ +Routes to internal TLS termination frontend + │ + ▼ +HAProxy terminates TLS using registry.pem + │ + ▼ +HTTP proxy to localhost:5000 +``` + +--- + +## Troubleshooting + +For detailed solutions, see the [Prerequisites Troubleshooting Guide](/tutorial/troubleshooting-prerequisites#haproxy-setup-issues): + +- [Port 443 Already in Use](/tutorial/troubleshooting-prerequisites#port-443-already-in-use) +- [Configuration Test Fails](/tutorial/troubleshooting-prerequisites#configuration-test-fails) +- [Certificate Errors](/tutorial/troubleshooting-prerequisites#certificate-errors) +- [Backend Health Check Failing](/tutorial/troubleshooting-prerequisites#backend-health-check-failing) +- [Gateway Not Receiving Traffic](/tutorial/troubleshooting-prerequisites#gateway-not-receiving-traffic) + +--- + +## Next Steps + +With HAProxy installed, proceed to configure services that use it: + +- [Local Docker Registry](/tutorial/local-docker-registry) - Registry behind HAProxy +- [Management Interface Setup](/tutorial/management-interface-setup) - VMM management via HAProxy +- [Gateway Service Setup](/tutorial/gateway-service-setup) - Gateway with HAProxy passthrough + +## Additional Resources + +- [HAProxy Documentation](https://www.haproxy.org/documentation/) +- [HAProxy Configuration Manual](https://cbonte.github.io/haproxy-dconv/) +- [Let's Encrypt Documentation](https://letsencrypt.org/docs/) diff --git a/docs/tutorials/hello-world-app.md b/docs/tutorials/hello-world-app.md new file mode 100644 index 00000000..66dec5f2 --- /dev/null +++ b/docs/tutorials/hello-world-app.md @@ -0,0 +1,477 @@ +--- +title: "Hello World Application" +description: "Deploy your first application to a dstack Confidential Virtual Machine" +section: "First Application" +stepNumber: 1 +totalSteps: 2 +lastUpdated: 2026-03-06 +prerequisites: + - gateway-service-setup +tags: + - dstack + - cvm + - deployment + - docker-compose + - hello-world +difficulty: "intermediate" +estimatedTime: "30 minutes" +--- + +# Hello World Application + +This tutorial guides you through deploying your first application to a dstack Confidential Virtual Machine (CVM). You'll deploy a simple nginx web server that runs inside a TDX-protected environment with full gateway integration, verifying that your entire dstack infrastructure is working correctly end-to-end. + +## What You'll Deploy + +| Component | Description | +|-----------|-------------| +| **nginx:alpine** | Lightweight web server running inside a CVM | +| **KMS attestation** | TDX-verified app identity via on-chain compose hash | +| **Gateway routing** | HTTPS access via WireGuard tunnel with Let's Encrypt certificate | + +## How CVM Deployment Works + +When you deploy an application to dstack: + +1. **vmm-cli.py compose** generates an encrypted deployment manifest (`app-compose.json`) +2. **On-chain registration** whitelists the compose hash so KMS will attest the app +3. **vmm-cli.py deploy** creates a TDX-protected CVM with the manifest +4. **Guest OS** boots, Docker containers start, and the app contacts KMS for attestation +5. **Gateway registration** — with `--gateway` flag, the app CVM establishes a WireGuard tunnel to the gateway +6. **HTTPS routing** — the gateway provisions a Let's Encrypt certificate and routes traffic to the app + +``` +Client HTTPS Request + │ + ▼ +┌──────────────────┐ +│ HAProxy (:443) │ +│ SNI routing │ +└────────┬─────────┘ + │ + ▼ +┌──────────────────┐ WireGuard ┌──────────────┐ +│ Gateway CVM │ ◄────────────────► │ App CVM │ +│ TLS termination │ tunnel │ nginx :80 │ +│ Let's Encrypt │ │ TDX protected│ +└──────────────────┘ └──────────────┘ +``` + +## Prerequisites + +### Server + +- Completed [Gateway CVM Deployment](/tutorial/gateway-service-setup) — gateway running and admin API bootstrapped +- KMS CVM running on port 9100 +- VMM running (`systemctl status dstack-vmm`) +- Python cryptography libraries for `vmm-cli.py`: + ```bash + pip3 install --break-system-packages cryptography eth-keys eth-utils "eth-hash[pycryptodome]" + ``` + +### Local machine + +- Foundry toolchain installed (`cast` command available) +- Wallet private key at `~/.dstack/secrets/sepolia-private-key` +- KMS contract address at `~/.dstack/secrets/kms-contract-address` + +Verify the infrastructure is ready: + +```bash +# KMS is responding +curl -sk https://localhost:9100/prpc/KMS.GetMeta | jq '{chain_id}' && echo "KMS: OK" + +# Gateway admin API is responding +curl -sf http://127.0.0.1:9203/prpc/Status > /dev/null && echo "Gateway: OK" + +# VMM is running +systemctl is-active dstack-vmm && echo "VMM: OK" +``` + +## Step 1: Create Application Directory + +```bash +mkdir -p ~/hello-world-deploy +cd ~/hello-world-deploy +``` + +## Step 2: Create Docker Compose File + +Create a minimal compose file. The app runs inside a CVM, so there is no access to the host filesystem — do not use local volume mounts. + +```bash +cat > docker-compose.yaml << 'EOF' +services: + nginx: + image: nginx:alpine + ports: + - "80:80" + restart: always +EOF +``` + +| Setting | Description | +|---------|-------------| +| `image: nginx:alpine` | Lightweight nginx image, pulled from Docker Hub at boot | +| `ports: "80:80"` | Expose port 80 inside the CVM | +| `restart: always` | Restart container if it crashes | + +> **No local volumes:** Unlike a traditional Docker setup, CVMs don't have access to host directories. The default nginx welcome page is served automatically. To serve custom content, you would bake it into a custom Docker image. + +## Step 3: Register App On-Chain + +> **Run on your local machine.** This step uses `cast` (Foundry) and your wallet private key, which live on your local machine — not on the server. + +The app needs an on-chain identity so KMS can attest it and the gateway can route traffic to it. + +### Load wallet credentials + +```bash +export PRIVATE_KEY=$(cat ~/.dstack/secrets/sepolia-private-key) +export ETH_RPC_URL="https://ethereum-sepolia-rpc.publicnode.com" +export KMS_CONTRACT_ADDR=$(cat ~/.dstack/secrets/kms-contract-address) +``` + +### Deploy and register the app + +```bash +HELLO_APP_ID=$(cast send "$KMS_CONTRACT_ADDR" \ + "deployAndRegisterApp(address,bool,bool,bytes32,bytes32)" \ + "$(cast wallet address --private-key $PRIVATE_KEY)" \ + false \ + true \ + 0x0000000000000000000000000000000000000000000000000000000000000000 \ + 0x0000000000000000000000000000000000000000000000000000000000000000 \ + --rpc-url "$ETH_RPC_URL" \ + --private-key "$PRIVATE_KEY" \ + --json | jq -r '.logs[-1].topics[1]' | sed 's/0x000000000000000000000000/0x/') + +echo "Hello World App ID: $HELLO_APP_ID" +``` + +Verify the app was created: + +```bash +cast call "$HELLO_APP_ID" "owner()(address)" --rpc-url "$ETH_RPC_URL" +``` + +This should return your wallet address. + +### Save the app ID + +```bash +echo "$HELLO_APP_ID" > ~/.dstack/secrets/hello-world-app-id +``` + +### Copy the app ID to the server + +The server needs the app ID for Step 6 (CVM deployment). Copy it over: + +```bash +# Replace user@your-server with your actual server SSH target +scp ~/.dstack/secrets/hello-world-app-id user@your-server:~/.dstack/secrets/ +``` + +SSH back into the server before continuing: + +```bash +ssh user@your-server +``` + +## Step 4: Generate Deployment Manifest + +Use `vmm-cli.py compose` to generate the encrypted deployment manifest. The `--gateway` and `--kms` flags enable gateway registration and KMS attestation. + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +./src/vmm-cli.py --url http://127.0.0.1:9080 compose \ + --docker-compose ~/hello-world-deploy/docker-compose.yaml \ + --name hello-world \ + --gateway \ + --kms \ + --public-logs \ + --output ~/hello-world-deploy/app-compose.json +``` + +**Key flags:** + +| Flag | Purpose | +|------|---------| +| `--gateway` | Enable gateway integration — the CVM will register with the gateway and establish a WireGuard tunnel | +| `--kms` | Enable KMS attestation — the CVM will contact KMS for TDX verification | +| `--public-logs` | Allow log access via VMM API (useful for debugging) | + +### Get the compose hash for Step 5 + +The compose hash is needed on your local machine for on-chain whitelisting. Display it and copy the value: + +```bash +COMPOSE_HASH=$(sha256sum ~/hello-world-deploy/app-compose.json | cut -d' ' -f1) +echo "Compose hash: 0x$COMPOSE_HASH" +``` + +Copy the full `0x...` hash value — you'll paste it into Step 5 on your local machine. + +## Step 5: Whitelist Compose Hash On-Chain + +> **Run on your local machine.** This step uses `cast` and your wallet private key. + +The KMS contract verifies that the exact compose configuration is authorized. Use the compose hash from Step 4 and register it on-chain. + +If you're in a new shell since Step 3, re-load your wallet credentials: + +```bash +export PRIVATE_KEY=$(cat ~/.dstack/secrets/sepolia-private-key) +export ETH_RPC_URL="https://ethereum-sepolia-rpc.publicnode.com" +export KMS_CONTRACT_ADDR=$(cat ~/.dstack/secrets/kms-contract-address) +``` + +Set the compose hash (paste the value displayed in Step 4): + +```bash +COMPOSE_HASH="" + +HELLO_APP_ID=$(cat ~/.dstack/secrets/hello-world-app-id) + +cast send "$HELLO_APP_ID" \ + "addComposeHash(bytes32)" \ + "0x$COMPOSE_HASH" \ + --rpc-url "$ETH_RPC_URL" \ + --private-key "$PRIVATE_KEY" +``` + +Verify: + +```bash +cast call "$HELLO_APP_ID" \ + "allowedComposeHashes(bytes32)(bool)" \ + "0x$COMPOSE_HASH" \ + --rpc-url "$ETH_RPC_URL" +``` + +Expected output: `true` + +> **Important:** If you modify `docker-compose.yaml` and regenerate `app-compose.json`, the hash changes. You must whitelist the new hash before deploying. + +SSH back into the server before continuing: + +```bash +ssh user@your-server +``` + +## Step 6: Deploy the CVM + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +SRV_DOMAIN=$(grep ^SRV_DOMAIN ~/gateway-deploy/.env | cut -d= -f2) +KMS_DOMAIN=$(grep ^KMS_DOMAIN ~/gateway-deploy/.env | cut -d= -f2) + +./src/vmm-cli.py --url http://127.0.0.1:9080 deploy \ + --name hello-world \ + --app-id "$(cat ~/.dstack/secrets/hello-world-app-id)" \ + --compose ~/hello-world-deploy/app-compose.json \ + --gateway-url "https://gateway.$SRV_DOMAIN" \ + --kms-url "https://$KMS_DOMAIN:9100" \ + --image dstack-0.5.7 \ + --vcpu 2 \ + --memory 2G \ + --port tcp:0.0.0.0:9300:80 +``` + +**Key flags:** + +| Flag | Value | Purpose | +|------|-------|---------| +| `--app-id` | Hello World app ID | Links CVM to on-chain app identity | +| `--gateway-url` | `https://gateway.$SRV_DOMAIN` | Gateway RPC endpoint (uses port 443 via HAProxy passthrough) | +| `--kms-url` (1st) | `https://127.0.0.1:9100` | Host-side KMS for env encryption | +| `--kms-url` (2nd) | `https://$KMS_DOMAIN:9100` | CVM-side KMS (domain must match TLS cert) | +| `--port` | `tcp:0.0.0.0:9300:80` | Direct port mapping for testing (optional) | + +> **Why two `--kms-url` values?** Same reason as the gateway — the first is for host-side encryption, the second is for CVM-side runtime access. See [Gateway CVM Deployment](/tutorial/gateway-service-setup#step-2-deploy-the-gateway-cvm) for details. + +## Step 7: Monitor Boot Logs + +List VMs and get the hello-world ID: + +```bash +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +Follow the boot logs (replace `VM_ID` with the actual ID): + +```bash +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=true&ansi=false" +``` + +Watch for these key log messages: + +``` +Docker container starting... +nginx: the configuration file /etc/nginx/nginx.conf syntax is ok +``` + +And if gateway integration is working: + +``` +Registering with gateway... +WireGuard tunnel established +``` + +The CVM typically boots in 1-2 minutes. + +## Step 8: Verify via Gateway (HTTPS) + +Once the CVM registers with the gateway, it's accessible via an HTTPS URL. The gateway automatically provisions a Let's Encrypt certificate. + +Find your app's gateway URL. If you've deployed multiple times, the `hosts` array may contain stale entries from previous deployments. Use the most recent `latest_handshake` to identify the active instance: + +```bash +# Get the most recently active app instance +curl -sf http://127.0.0.1:9203/prpc/Status | jq '.hosts | sort_by(.latest_handshake) | reverse | .[0]' +``` + +The `instance_id` and `base_domain` fields determine the app URL: `https://-80.gateway.`. + +Access the app: + +```bash +# Replace with your actual instance_id and base_domain from the output above +curl -s "https://-80.gateway./" +``` + +You should see the default nginx welcome page HTML. The Let's Encrypt certificate is automatically provisioned, so this works without `-k`. + +Verify the certificate: + +```bash +echo | openssl s_client -connect -80.gateway.:443 -servername -80.gateway. 2>/dev/null | openssl x509 -noout -issuer -subject +``` + +The issuer should be `Let's Encrypt` (not `STAGING`). + +## Step 9: Verify via Direct Port Mapping + +As an alternative to gateway access, you can test directly via the mapped port: + +```bash +curl -s http://YOUR_SERVER_IP:9300/ +``` + +This bypasses the gateway and hits nginx directly. You should see the same nginx welcome page. + +> **Note:** Direct port access is unencrypted HTTP. In production, use the gateway HTTPS URL. + +## Managing the Application + +Navigate to the VMM directory: + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) +``` + +### List running VMs + +```bash +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +### View logs + +```bash +VM_ID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json | jq -r '.[] | select(.name=="hello-world") | .id') +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=$VM_ID&follow=false&ansi=false&lines=50" +``` + +### Stop and remove + +```bash +VM_ID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json | jq -r '.[] | select(.name=="hello-world") | .id') +./src/vmm-cli.py --url http://127.0.0.1:9080 stop --force "$VM_ID" +./src/vmm-cli.py --url http://127.0.0.1:9080 remove "$VM_ID" +``` + +### Redeploy + +To redeploy after changes: + +1. Remove the existing CVM (see above) +2. If you changed `docker-compose.yaml`, regenerate `app-compose.json` (Step 4) and whitelist the new hash (Step 5) +3. Re-run the deploy command (Step 6) + +--- + +## Troubleshooting + +For detailed solutions, see the [First Application Troubleshooting Guide](/tutorial/troubleshooting-first-application#hello-world-app-issues): + +- [CVM fails to start](/tutorial/troubleshooting-first-application#cvm-fails-to-start) +- ["OS image is not allowed"](/tutorial/troubleshooting-first-application#os-image-is-not-allowed) +- [CVM boots but no gateway registration](/tutorial/troubleshooting-first-application#cvm-boots-but-no-gateway-registration) +- [Application not accessible via gateway](/tutorial/troubleshooting-first-application#application-not-accessible-via-gateway) +- [Cannot pull Docker images](/tutorial/troubleshooting-first-application#cannot-pull-docker-images) + +--- + +## Verification Checklist + +Before proceeding, verify: + +- [ ] App registered on-chain with `deployAndRegisterApp` +- [ ] Compose hash whitelisted on app contract +- [ ] CVM deployed and running (`lsvm` shows status) +- [ ] CVM registered with gateway (WireGuard tunnel established) +- [ ] Application accessible via gateway HTTPS URL (valid Let's Encrypt cert) +- [ ] Application accessible via direct port mapping (optional) + +--- + +## What's Running Inside Your CVM + +``` +┌─────────────────────────────────────────────────────────────┐ +│ CVM (TDX Protected) │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ Docker Container │ │ +│ │ ┌─────────────┐ │ │ +│ │ │ nginx │ │ │ +│ │ │ :80 │ │ │ +│ │ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────┘ │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ Guest Agent │ │ +│ │ - TDX attestation via /var/run/dstack.sock │ │ +│ │ - Docker lifecycle management │ │ +│ │ - WireGuard tunnel to gateway │ │ +│ │ - Log forwarding to VMM │ │ +│ └───────────────────────────────────────────────────────┘ │ +│ ┌───────────────────────────────────────────────────────┐ │ +│ │ TDX Protection │ │ +│ │ - Encrypted memory (hardware-enforced) │ │ +│ │ - Measured boot chain (MRTD, RTMRs) │ │ +│ │ - Isolated from host OS │ │ +│ └───────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Next Steps + +Your Hello World application is running inside a TDX-protected CVM with full gateway integration. From here you can: + +- Deploy more complex applications with multiple containers +- Use the tappd socket (`/var/run/tappd.sock`) for TDX attestation from your application +- Build custom Docker images with your own application code + +## Additional Resources + +- [Docker Compose Reference](https://docs.docker.com/compose/compose-file/) +- [nginx Documentation](https://nginx.org/en/docs/) +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) +- [dstack Examples Repository](https://github.com/Dstack-TEE/dstack-examples) diff --git a/docs/tutorials/kms-build-configuration.md b/docs/tutorials/kms-build-configuration.md new file mode 100644 index 00000000..be06b036 --- /dev/null +++ b/docs/tutorials/kms-build-configuration.md @@ -0,0 +1,705 @@ +--- +title: "KMS Build & Configuration" +description: "Build and configure the dstack Key Management Service" +section: "KMS Deployment" +stepNumber: 2 +totalSteps: 3 +lastUpdated: 2026-01-09 +prerequisites: + - contract-deployment + - guest-image-setup +tags: + - dstack + - kms + - cargo + - build + - configuration +difficulty: "advanced" +estimatedTime: "25 minutes" +--- + +# KMS Build & Configuration + +This tutorial guides you through building and configuring the dstack Key Management Service (KMS). The KMS is a critical component that manages cryptographic keys for TEE applications. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [Contract Deployment](/tutorial/contract-deployment) with deployed KMS contract +- Completed [TDX & SGX Verification](/tutorial/tdx-sgx-verification) - **SGX must be verified before KMS deployment** +- Completed [Rust Toolchain Installation](/tutorial/rust-toolchain-installation) +- dstack repository cloned to ~/dstack + +> **Important:** The KMS uses a `local_key_provider` that requires SGX to generate TDX attestation quotes. Without SGX properly configured (including Auto MP Registration in BIOS), KMS cannot bootstrap and will fail to generate cryptographic proofs of its TDX environment. + + +## What Gets Built + +The dstack KMS provides: + +| Component | Purpose | +|-----------|---------| +| **dstack-kms** | Main KMS binary - generates and stores cryptographic keys | +| **auth-eth** | Node.js service - verifies app permissions via smart contract | +| **kms.toml** | Configuration file for KMS settings | +| **auth-eth.env** | Environment file with Ethereum RPC credentials | +| **Docker image** | Containerized KMS for deployment in a CVM | +| **docker-compose.yml** | Deployment manifest for VMM | + +> **Note:** KMS runs inside a Confidential Virtual Machine (CVM) to enable TDX attestation. The Docker image packages KMS for CVM deployment. + +--- + +## Manual Build + +> **Note:** The previous tutorial ([Contract Deployment](/tutorial/contract-deployment)) was run on your **local machine**. The remaining tutorials are run on your **TDX server**. SSH back in before continuing: +> ```bash +> ssh ubuntu@YOUR_SERVER_IP +> ``` + +If you prefer to build manually, follow these steps. + +### Step 1: Build the KMS Binary + +Build the KMS service using Cargo in release mode. + +### Navigate to repository root + +```bash +cd ~/dstack +``` + +### Build KMS in release mode + +```bash +cargo build --release -p dstack-kms +``` + +This compilation will: +- Download and compile KMS dependencies +- Build the KMS binary with optimizations + +### Verify the build + +```bash +ls -lh ~/dstack/target/release/dstack-kms +``` + +Expected output (typically 20-30MB): +``` +-rwxrwxr-x 1 ubuntu ubuntu 25M Nov 20 10:30 /home/ubuntu/dstack/target/release/dstack-kms +``` + +### Test the binary + +```bash +~/dstack/target/release/dstack-kms --help +``` + +This displays available command-line options. + +## Step 2: Install KMS to System Path + +Install the KMS binary to a system-wide location. + +### Copy to /usr/local/bin + +```bash +sudo cp ~/dstack/target/release/dstack-kms /usr/local/bin/dstack-kms +sudo chmod 755 /usr/local/bin/dstack-kms +``` + +### Verify installation + +```bash +which dstack-kms +dstack-kms --help +``` + +## Step 3: Create Configuration Directories + +Create the directory structure for KMS configuration and certificates. + +### Create directories + +```bash +# Configuration directory +sudo mkdir -p /etc/kms + +# Certificate directory +sudo mkdir -p /etc/kms/certs + +# Runtime directories +sudo mkdir -p /var/run/kms +sudo mkdir -p /var/log/kms + +# Set permissions +sudo chown -R $USER:$USER /etc/kms +sudo chown -R $USER:$USER /var/run/kms +sudo chown -R $USER:$USER /var/log/kms +``` + +### Verify directory structure + +```bash +ls -la /etc/kms +``` + +You should see: +``` +total 12 +drwxr-xr-x 3 ubuntu ubuntu 4096 Nov 20 10:35 . +drwxr-xr-x 3 root root 4096 Nov 20 10:35 .. +drwxr-xr-x 2 ubuntu ubuntu 4096 Nov 20 10:35 certs +``` + +## Step 4: Create KMS Configuration + +Create the main KMS configuration file. + +### Create kms.toml + +```bash +cat > /etc/kms/kms.toml << 'EOF' +# dstack KMS Configuration +# See: https://github.com/Dstack-TEE/dstack + +[default] +workers = 8 +max_blocking = 64 +ident = "DStack KMS" +temp_dir = "/tmp" +keep_alive = 10 +log_level = "info" + +# RPC Server Configuration +[rpc] +address = "0.0.0.0" +port = 9100 + +# TLS Certificate Configuration for RPC +[rpc.tls] +key = "/etc/kms/certs/rpc.key" +certs = "/etc/kms/certs/rpc.crt" + +# Mutual TLS (mTLS) Configuration +[rpc.tls.mutual] +ca_certs = "/etc/kms/certs/tmp-ca.crt" +mandatory = false + +# Core KMS Configuration +[core] +cert_dir = "/etc/kms/certs" +subject_postfix = ".dstack" +# Intel PCCS URL for TDX quote verification +pccs_url = "https://pccs.phala.network/sgx/certification/v4" + +# Authentication API Configuration +# Uses webhook to query Ethereum contract via auth-eth service +[core.auth_api] +type = "webhook" + +[core.auth_api.webhook] +url = "http://127.0.0.1:9200" + +# Onboarding Configuration +[core.onboard] +enabled = true +auto_bootstrap_domain = "" +quote_enabled = true +address = "0.0.0.0" +port = 9100 +EOF +``` + +### Configuration explained + +| Section | Key | Description | +|---------|-----|-------------| +| `[default]` | `workers` | Number of worker threads (default: 8) | +| `[default]` | `log_level` | Logging level: debug, info, warn, error | +| `[rpc]` | `address` | RPC server bind address | +| `[rpc]` | `port` | RPC server port (9100) | +| `[core]` | `cert_dir` | Directory for certificates | +| `[core]` | `pccs_url` | Local PCCS via host bridge (`10.0.2.2`) for quote verification | +| `[core.auth_api]` | `url` | Auth-eth webhook service URL | +| `[core.onboard]` | `enabled` | Enable bootstrap/onboard mode | + +## Step 5: Build Auth-ETH Service + +The KMS requires the auth-eth service to query the Ethereum contract for authorization. + +### Install Node.js + +The auth-eth service requires Node.js. Install Node.js 20.x from NodeSource: + +```bash +curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - +sudo apt-get install -y nodejs +``` + +Verify the installation: + +```bash +node --version +npm --version +``` + +You should see Node.js v20.x and npm v10.x (or later). + +### Navigate to auth-eth directory + +```bash +cd ~/dstack/kms/auth-eth +``` + +### Install dependencies + +```bash +npm install +``` + +### Build TypeScript + +```bash +npx tsc --project tsconfig.json +``` + +### Verify build + +```bash +ls -la dist/src/ +``` + +You should see `main.js` and other compiled files. + +## Step 6: Create Auth-ETH Configuration + +Create environment configuration for the auth-eth service. + +### Get contract address from deployment + +The contract address was created during [Contract Deployment](/tutorial/contract-deployment), which ran on your **local machine**. You need to transfer this address to your server. + +**Option A: Read from saved secrets** + +If you saved the contract address in the previous tutorial: + +```bash +KMS_CONTRACT_ADDRESS=$(cat ~/.dstack/secrets/kms-contract-address) +echo "Contract address: $KMS_CONTRACT_ADDRESS" +``` + +**Option B: Check Etherscan** + +If you've lost the address, find it on [Sepolia Etherscan](https://sepolia.etherscan.io/) by searching for your wallet address and looking at recent contract deployments. + +### Create environment file + +```bash +cat > /etc/kms/auth-eth.env << EOF +# Auth-ETH Service Configuration + +# Server settings +HOST=127.0.0.1 +PORT=9200 + +# Ethereum RPC endpoint (Sepolia testnet) +ETH_RPC_URL=https://ethereum-sepolia-rpc.publicnode.com + +# KMS Authorization Contract Address +KMS_CONTRACT_ADDR=$KMS_CONTRACT_ADDRESS +EOF +``` + +### Secure the file + +```bash +chmod 600 /etc/kms/auth-eth.env +``` + +### Verify configuration + +```bash +cat /etc/kms/auth-eth.env +``` + +## Step 7: Create Docker Image for CVM Deployment + +KMS runs inside a Confidential Virtual Machine (CVM) to enable TDX attestation. We need to create a Docker image that packages KMS and auth-eth together. + +### Create deployment directory + +```bash +mkdir -p ~/kms-deployment +cd ~/kms-deployment +``` + +### Create QCNL Configuration + +The CVM needs to know how to reach a PCCS for attestation. We use Phala Network's public PCCS: + +```bash +cat > sgx_default_qcnl.conf << 'EOF' +{ + "pccs_url": "https://pccs.phala.network/sgx/certification/v4/", + "use_secure_cert": false, + "retry_times": 6, + "retry_delay": 10 +} +EOF +``` + +### Create .dockerignore + +Exclude `node_modules` from the build context to avoid transferring hundreds of megabytes: + +```bash +cat > .dockerignore << 'EOF' +auth-eth/node_modules +EOF +``` + +### Create Dockerfile + +The Dockerfile bakes all configuration into the image for reliable CVM deployment: + +```bash +cat > Dockerfile << 'EOF' +# KMS Docker Image for CVM Deployment +# Extract dstack-acpi-tables and QEMU BIOS files from the official builder image. +# These are required for OS image verification (computing expected TDX measurements). +FROM dstacktee/dstack-kms@sha256:11ac59f524a22462ccd2152219b0bec48a28ceb734e32500152d4abefab7a62a AS official + +FROM ubuntu:24.04 + +# Install runtime dependencies +# libglib2.0-0t64, libpixman-1-0, and libslirp0 are required by dstack-acpi-tables (QEMU binary) +RUN apt-get update && \ + apt-get install -y ca-certificates curl libglib2.0-0t64 libpixman-1-0 libslirp0 && \ + rm -rf /var/lib/apt/lists/* + +# Install Node.js 20.x for auth-eth +RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \ + apt-get install -y nodejs && \ + rm -rf /var/lib/apt/lists/* + +# Create directories +RUN mkdir -p /etc/kms/certs /etc/kms/images /var/run/kms /var/log/kms + +# Copy dstack-acpi-tables from official image (needed for OS image verification) +COPY --from=official /usr/local/bin/dstack-acpi-tables /usr/local/bin/dstack-acpi-tables +COPY --from=official /usr/local/share/qemu /usr/local/share/qemu + +# Copy KMS binary +COPY dstack-kms /usr/local/bin/dstack-kms +RUN chmod 755 /usr/local/bin/dstack-kms + +# Copy configuration files (baked into image) +COPY kms.toml /etc/kms/kms.toml +COPY auth-eth.env /etc/kms/auth-eth.env +COPY sgx_default_qcnl.conf /etc/sgx_default_qcnl.conf + +# Copy auth-eth service and install dependencies +COPY auth-eth /opt/auth-eth +RUN cd /opt/auth-eth && npm install --production + +# Copy startup script +COPY start-kms.sh /usr/local/bin/start-kms.sh +RUN chmod 755 /usr/local/bin/start-kms.sh + +EXPOSE 9100 + +ENTRYPOINT ["/usr/local/bin/start-kms.sh"] +EOF +``` + +### Create startup script + +The startup script runs both KMS and auth-eth services: + +```bash +cat > start-kms.sh << 'EOF' +#!/bin/bash +set -e + +# Start auth-eth in background +cd /opt/auth-eth +node dist/src/main.js & +AUTH_ETH_PID=$! + +# Wait for auth-eth to be ready +sleep 2 + +# Start KMS (foreground) +exec /usr/local/bin/dstack-kms --config /etc/kms/kms.toml +EOF +``` + +### Create CVM-specific kms.toml + +The KMS config for CVM deployment enables TDX attestation: + +```bash +cat > kms.toml << 'EOF' +# dstack KMS Configuration (CVM Deployment) + +[default] +workers = 8 +max_blocking = 64 +ident = "DStack KMS" +temp_dir = "/tmp" +keep_alive = 10 +log_level = "info" + +# RPC Server Configuration +[rpc] +address = "0.0.0.0" +port = 9100 + +# TLS Certificate Configuration for RPC +[rpc.tls] +key = "/etc/kms/certs/rpc.key" +certs = "/etc/kms/certs/rpc.crt" + +# Mutual TLS (mTLS) Configuration +[rpc.tls.mutual] +ca_certs = "/etc/kms/certs/tmp-ca.crt" +mandatory = false + +# Core KMS Configuration +[core] +cert_dir = "/etc/kms/certs" +subject_postfix = ".dstack" +pccs_url = "https://pccs.phala.network/sgx/certification/v4" + +# OS Image Verification +# KMS downloads OS images to compute expected TDX measurements +[core.image] +verify = true +cache_dir = "/etc/kms/images" +download_url = "https://download.dstack.org/os-images/mr_{OS_IMAGE_HASH}.tar.gz" +download_timeout = "2m" + +# Authentication API Configuration +[core.auth_api] +type = "webhook" + +[core.auth_api.webhook] +url = "http://127.0.0.1:9200" + +# Onboarding Configuration +[core.onboard] +enabled = true +# Empty domain = manual bootstrap mode (ensures bootstrap-info.json is written) +auto_bootstrap_domain = "" +# Enable TDX quotes - works because KMS runs in CVM +quote_enabled = true +address = "0.0.0.0" +port = 9100 +EOF +``` + +> **Why empty `auto_bootstrap_domain`?** With an empty domain, KMS starts in "onboard mode" — a plain HTTP server that waits for you to trigger bootstrap via an RPC call. This ensures `bootstrap-info.json` is written to disk, which is required for on-chain KMS registration. You'll provide the domain during the bootstrap step in [KMS CVM Deployment](/tutorial/kms-cvm-deployment). + +### Copy build artifacts and configuration + +```bash +# Copy KMS binary +cp ~/dstack/target/release/dstack-kms . + +# Copy auth-eth service +cp -r ~/dstack/kms/auth-eth auth-eth + +# Copy auth-eth environment config +cp /etc/kms/auth-eth.env . +``` + +### Build Docker image + +```bash +docker build -t dstack-kms:latest . +``` + +### Verify image was created + +```bash +docker images dstack-kms +``` + +Expected output: +``` +REPOSITORY TAG IMAGE ID CREATED SIZE +dstack-kms latest abc123def456 10 seconds ago ~300MB +``` + +### Push to local registry + +Tag and push the image to your local Docker registry so CVMs can pull it during boot. Push directly to `localhost:5000` (HAProxy only handles read access for CVM pulls): + +```bash +# Tag for local registry (push via localhost, pull via HAProxy domain) +docker tag dstack-kms:latest localhost:5000/dstack-kms:latest +docker tag dstack-kms:latest localhost:5000/dstack-kms:fixed + +# Push both tags +docker push localhost:5000/dstack-kms:latest +docker push localhost:5000/dstack-kms:fixed +``` + +Verify the image is in the registry (via HAProxy): + +```bash +curl -sk https://registry.yourdomain.com/v2/dstack-kms/tags/list +``` + +Expected output: +```json +{"name":"dstack-kms","tags":["fixed","latest"]} +``` + +## Step 8: Create docker-compose.yml + +Create the deployment manifest for VMM deployment. + +### Create docker-compose.yml + +```bash +cat > docker-compose.yml << 'EOF' +# KMS Deployment Manifest for dstack CVM +# Deploy via VMM web interface at http://localhost:9080 + +services: + kms: + image: dstack-kms:latest + ports: + - "9100:9100" + volumes: + # Mount config file from local directory + - ./kms.toml:/etc/kms/kms.toml:ro + - ./auth-eth.env:/etc/kms/auth-eth.env:ro + # Named volume for persistent certificates + - kms-certs:/etc/kms/certs + environment: + - RUST_LOG=info + restart: unless-stopped + +volumes: + kms-certs: + # Certificates persist across container restarts +EOF +``` + +### Verify deployment files + +```bash +ls -la ~/kms-deployment/ +``` + +You should have: +- `Dockerfile` - Container build definition +- `dstack-kms` - KMS binary +- `auth-eth/` - Auth-eth service directory +- `start-kms.sh` - Startup script +- `docker-compose.yml` - Deployment manifest +- `kms.toml` - KMS configuration +- `auth-eth.env` - Auth-eth environment +- `sgx_default_qcnl.conf` - QCNL configuration for CVM PCCS access + +## Step 9: Verify Configuration + +### Check KMS configuration syntax + +The KMS loads configuration using the Rocket framework's Figment library: + +```bash +# Validate TOML syntax +cat /etc/kms/kms.toml | python3 -c "import sys, tomllib; tomllib.load(sys.stdin.buffer); print('Valid TOML')" +``` + +### Check auth-eth configuration + +```bash +# Source and verify environment +source /etc/kms/auth-eth.env +echo "ETH_RPC_URL: ${ETH_RPC_URL:0:30}..." +echo "KMS_CONTRACT_ADDR: $KMS_CONTRACT_ADDR" +``` + +### Test RPC connectivity + +```bash +source /etc/kms/auth-eth.env +curl -s -X POST "$ETH_RPC_URL" \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' | \ + jq . +``` + +Expected output shows the current block number. + +### Verify contract exists + +```bash +source /etc/kms/auth-eth.env +curl -s -X POST "$ETH_RPC_URL" \ + -H "Content-Type: application/json" \ + -d "{\"jsonrpc\":\"2.0\",\"method\":\"eth_getCode\",\"params\":[\"$KMS_CONTRACT_ADDR\",\"latest\"],\"id\":1}" | \ + jq -r 'if .result != "0x" then "✓ Contract found" else "✗ Contract not found" end' +``` + +--- + +## Architecture Overview + +### Component Interaction + +``` +┌─────────────┐ ┌─────────────┐ ┌──────────────┐ +│ TEE App │────►│ KMS │────►│ Auth-ETH │ +└─────────────┘ └─────────────┘ └──────────────┘ + │ │ │ + │ │ ▼ + │ │ ┌──────────────┐ + │ │ │ Ethereum │ + │ │ │ (Sepolia) │ + │ │ └──────────────┘ + │ │ │ + │ ▼ │ + │ ┌─────────────┐ │ + └───────────►│ VMM │◄────────────┘ + └─────────────┘ +``` + +### Data Flow + +1. **TEE App** requests key from **KMS** +2. **KMS** calls **Auth-ETH** webhook to verify authorization +3. **Auth-ETH** queries **Ethereum** smart contract +4. If authorized, **KMS** returns key to app +5. **VMM** orchestrates the overall TEE environment + +## Troubleshooting + +For detailed solutions, see the [KMS Deployment Troubleshooting Guide](/tutorial/troubleshooting-kms-deployment#kms-build--configuration-issues): + +- [Build fails with missing dependencies](/tutorial/troubleshooting-kms-deployment#build-fails-with-missing-dependencies) +- [Configuration file not found](/tutorial/troubleshooting-kms-deployment#configuration-file-not-found) +- [Auth-eth npm install fails](/tutorial/troubleshooting-kms-deployment#auth-eth-npm-install-fails) +- [Invalid TOML syntax](/tutorial/troubleshooting-kms-deployment#invalid-toml-syntax) +- [RPC connection failed](/tutorial/troubleshooting-kms-deployment#rpc-connection-failed) +- [Contract address not set](/tutorial/troubleshooting-kms-deployment#contract-address-not-set) + +## Next Steps + +With KMS built and containerized, proceed to CVM deployment: + +- [KMS CVM Deployment](/tutorial/kms-cvm-deployment) - Deploy KMS as a Confidential VM + +## Additional Resources + +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) +- [Intel TDX Documentation](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html) +- [Rocket Framework](https://rocket.rs/) +- [Figment Configuration](https://docs.rs/figment/) diff --git a/docs/tutorials/kms-cvm-deployment.md b/docs/tutorials/kms-cvm-deployment.md new file mode 100644 index 00000000..9e90a566 --- /dev/null +++ b/docs/tutorials/kms-cvm-deployment.md @@ -0,0 +1,533 @@ +--- +title: "KMS CVM Deployment" +description: "Deploy dstack KMS as a Confidential Virtual Machine for TDX attestation" +section: "KMS Deployment" +stepNumber: 3 +totalSteps: 3 +lastUpdated: 2026-01-09 +prerequisites: + - kms-build-configuration + - gramine-key-provider + - local-docker-registry +tags: + - dstack + - kms + - cvm + - tdx + - vmm + - deployment +difficulty: "advanced" +estimatedTime: "20 minutes" +--- + +# KMS CVM Deployment + +This tutorial guides you through deploying the dstack KMS as a Confidential Virtual Machine (CVM). Running KMS inside a CVM enables TDX attestation, providing cryptographic proof that the KMS keys were generated in a genuine Intel TDX environment. + +## Why Deploy KMS in a CVM? + +Running KMS inside a CVM provides significant security benefits: + +| Benefit | Description | +|---------|-------------| +| **TDX Attestation** | Generate cryptographic quotes proving keys were created in genuine TDX | +| **Memory Encryption** | Root keys protected by TDX hardware encryption, not just file permissions | +| **Verifiable Integrity** | Anyone can verify KMS integrity via attestation quote | +| **Consistent Model** | KMS deployed the same way as other dstack applications | + +## Prerequisites + +Before starting, ensure you have: + +- Completed [KMS Build & Configuration](/tutorial/kms-build-configuration) +- Completed [Gramine Key Provider](/tutorial/gramine-key-provider) - Required for CVM boot +- Completed [Local Docker Registry](/tutorial/local-docker-registry) - With KMS image cached +- Completed [TDX & SGX Verification](/tutorial/tdx-sgx-verification) - SGX must be working for attestation +- KMS image pushed to local registry (`registry.yourdomain.com/dstack-kms:fixed`) +- dstack VMM running (`systemctl status dstack-vmm`) +- VMM web interface available at http://localhost:9080 + +> **Why SGX is required:** The KMS uses Intel SGX to generate TDX attestation quotes via the `local_key_provider`. SGX Auto MP Registration must be enabled in BIOS so your platform is registered with Intel's Provisioning Certification Service (PCS). Without this registration, KMS cannot generate valid attestation quotes, and bootstrap will fail. + +> **Why local registry?** The KMS Docker image is cached in your [Local Docker Registry](/tutorial/local-docker-registry) for reliable, fast access from CVMs. The auth-eth service inside the container requires `ETH_RPC_URL` and `KMS_CONTRACT_ADDR` environment variables — these are passed via docker-compose, not baked into the image. + + +## What Gets Deployed + +When you deploy KMS as a CVM, the following happens: + +1. **CVM Creation** - VMM creates a TDX-protected virtual machine +2. **Container Start** - Docker container runs inside the CVM +3. **Onboard Mode** - KMS starts a plain HTTP server, waiting for bootstrap +4. **Manual Bootstrap** - You trigger key generation via an RPC call +5. **TDX Quote** - KMS generates attestation quote proving TDX environment +6. **Service Ready** - KMS transitions to TLS and starts accepting connections + +### Generated Artifacts + +Inside the CVM at `/etc/kms/certs/`: + +| File | Purpose | +|------|---------| +| `root-ca.crt` | Root Certificate Authority (self-signed) | +| `root-ca.key` | Root CA signing key (P256 ECDSA) | +| `rpc.crt` | TLS certificate for RPC server | +| `rpc.key` | RPC server private key | +| `tmp-ca.crt` | Temporary CA for mutual TLS | +| `tmp-ca.key` | Temporary CA private key | +| `root-k256.key` | Ethereum signing key (secp256k1) | +| `bootstrap-info.json` | Public keys and TDX attestation quote | + +--- + +## Manual Deployment + +If you prefer to deploy manually, follow these steps. + +### Step 1: Verify Prerequisites + +Check that all required components are ready. + +#### Verify KMS image in local registry + +```bash +curl -sk https://registry.yourdomain.com/v2/dstack-kms/tags/list +``` + +Expected output shows the `:fixed` tag: +```json +{"name":"dstack-kms","tags":["fixed","latest"]} +``` + +If missing, complete the [Local Docker Registry](/tutorial/local-docker-registry) tutorial first. + +#### Verify Gramine Key Provider is running + +```bash +docker ps | grep gramine-sealing-key-provider +``` + +Should show the container running. If not, complete the [Gramine Key Provider](/tutorial/gramine-key-provider) tutorial. + +#### Verify VMM is running + +```bash +systemctl status dstack-vmm +``` + +The VMM must be active and running. + +### Step 2: Create Deployment Directory + +```bash +mkdir -p ~/kms-deploy +cd ~/kms-deploy +``` + +### Step 3: Create docker-compose.yaml + +> **Replace placeholders:** If you haven't already personalized the tutorials with your domain names, see [DNS Configuration: Personalize Tutorials](/tutorial/dns-configuration#personalize-tutorial-commands). You **must** replace `registry.yourdomain.com` and `kms.yourdomain.com` with your actual domains. + +Create the compose file with your registry domain and configuration: + +```bash +cat > docker-compose.yaml << 'EOF' +services: + kms: + image: registry.yourdomain.com/dstack-kms:fixed + ports: + - "9100:9100" + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + - kms-certs:/etc/kms/certs + environment: + - RUST_LOG=info + - KMS_DOMAIN=kms.yourdomain.com + - PORT=9200 + - ETH_RPC_URL=https://ethereum-sepolia-rpc.publicnode.com + - KMS_CONTRACT_ADDR=YOUR_CONTRACT_ADDRESS + configs: + - source: kms_config + target: /etc/kms/kms.toml + restart: unless-stopped + +volumes: + kms-certs: + +configs: + kms_config: + content: | + [rpc] + address = "0.0.0.0" + port = 9100 + + [rpc.tls] + key = "/etc/kms/certs/rpc.key" + certs = "/etc/kms/certs/rpc.crt" + + [rpc.tls.mutual] + ca_certs = "/etc/kms/certs/tmp-ca.crt" + mandatory = false + + [core] + cert_dir = "/etc/kms/certs" + pccs_url = "https://pccs.phala.network/sgx/certification/v4" + + [core.image] + verify = true + cache_dir = "/etc/kms/images" + download_url = "https://download.dstack.org/os-images/mr_{OS_IMAGE_HASH}.tar.gz" + download_timeout = "2m" + + [core.auth_api] + type = "webhook" + + [core.auth_api.webhook] + url = "http://127.0.0.1:9200" + + [core.onboard] + enabled = true + auto_bootstrap_domain = "" + quote_enabled = true + address = "0.0.0.0" + port = 9100 +EOF +``` + +Replace the placeholder values with your actual configuration: + +```bash +# Registry domain (must match your local Docker registry) +sed -i 's|registry.yourdomain.com|registry.your-actual-domain.com|g' docker-compose.yaml + +# KMS domain (for the KMS_DOMAIN env var) +sed -i 's|kms.yourdomain.com|kms.your-actual-domain.com|g' docker-compose.yaml + +# KMS contract address (from contract deployment tutorial) +sed -i "s|YOUR_CONTRACT_ADDRESS|$(cat ~/.dstack/secrets/kms-contract-address)|g" docker-compose.yaml +``` + +This docker-compose uses a Docker `configs` section to inject a complete `kms.toml` into the container at `/etc/kms/kms.toml`, overriding the config baked into the image. This approach lets you change KMS configuration without rebuilding the Docker image. + +**Key configuration sections in `kms.toml`:** + +| Section | Purpose | +|---------|---------| +| `[rpc]` | RPC server address and port (9100) | +| `[rpc.tls]` | TLS certificate paths for HTTPS | +| `[core.image]` | OS image verification — downloads images from `download.dstack.org` to compute expected TDX measurements | +| `[core.auth_api]` | Authentication via auth-eth webhook on localhost:9200 | +| `[core.onboard]` | Bootstrap settings — `auto_bootstrap_domain` is empty so KMS enters onboard mode for manual bootstrap | + +> **Why manual bootstrap?** With `auto_bootstrap_domain` left empty, KMS starts in "onboard mode" — a plain HTTP server on port 9100 that waits for you to trigger bootstrap via an RPC call. This ensures `bootstrap-info.json` (containing the TDX attestation quote and public keys) is written to disk. You'll need this file later to register the KMS on-chain. + +**Environment variables explained:** + +| Variable | Required | Description | +|----------|----------|-------------| +| `RUST_LOG` | Yes | KMS log level (`info`, `debug`, etc.) | +| `KMS_DOMAIN` | Yes | KMS domain name (used by start-kms.sh for reference) | +| `PORT` | Yes | auth-eth listen port — **must be `9200`** to match kms.toml webhook URL | +| `ETH_RPC_URL` | Yes | Ethereum Sepolia RPC endpoint | +| `KMS_CONTRACT_ADDR` | Yes | Your deployed KMS contract address | + +> **Getting your values:** +> ```bash +> # Your KMS contract address (from contract deployment tutorial) +> cat ~/.dstack/secrets/kms-contract-address +> ``` +> +> For `ETH_RPC_URL`, the tutorials use the free `https://ethereum-sepolia-rpc.publicnode.com` endpoint. For production, consider a dedicated RPC provider. + +**Other important settings:** +- `image`: Must use your local registry with the `:fixed` tag +- `/var/run/dstack.sock`: Required for TDX attestation +- `configs`: Injects `kms.toml` at runtime — the `start-kms.sh` entrypoint reads from `/etc/kms/kms.toml` + +### Step 4: Deploy via vmm-cli.py + +Use the VMM CLI tool to deploy the CVM: + +```bash +# Navigate to dstack VMM directory +cd ~/dstack/vmm + +# Set VMM auth from saved token +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +# Generate app-compose.json with local key provider enabled +./src/vmm-cli.py --url http://127.0.0.1:9080 compose \ + --name kms \ + --docker-compose ~/kms-deploy/docker-compose.yaml \ + --local-key-provider \ + --output ~/kms-deploy/app-compose.json + +# Deploy the CVM +./src/vmm-cli.py --url http://127.0.0.1:9080 deploy \ + --name kms \ + --image dstack-0.5.7 \ + --compose ~/kms-deploy/app-compose.json \ + --vcpu 2 \ + --memory 4096 \ + --disk 20 \ + --port tcp:0.0.0.0:9100:9100 +``` + +**Key flags explained:** +- `--local-key-provider`: Enables Gramine key provider for CVM boot +- `--image dstack-0.5.7`: Guest image from VMM images directory +- `--port tcp:0.0.0.0:9100:9100`: Maps host port 9100 to CVM port 9100 on all interfaces + +> **Why `0.0.0.0` and not `127.0.0.1`?** Gateway CVMs use QEMU user-mode networking and reach the host via its public IP. If KMS is bound to localhost only, gateway CVMs cannot connect. KMS authentication uses TDX attestation, not network isolation, so public accessibility is safe. + +> **Note:** Do NOT use `--secure-time` flag - it causes CVM to hang during boot waiting for time sync. + +### Step 5: Monitor Deployment + +List VMs to get the ID, then view the boot logs: + +```bash +# List VMs to get the ID +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +View CVM boot logs using curl (replace `VM_ID` with the actual ID from `lsvm`): + +```bash +# View recent logs +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=100" + +# Follow logs in real-time +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=true&ansi=false" +``` + +> **Note:** The VMM logs endpoint requires Bearer token authentication. The `vmm-cli.py logs` command may not work with token auth — use curl directly as shown above. + +Look for these log messages indicating KMS entered onboard mode: +``` +KMS CVM booting... +Docker container starting... +KMS initializing... +Onboarding +``` + +> **Important:** KMS is now in onboard mode — a plain HTTP server waiting for bootstrap. It will **not** serve TLS or respond to `KMS.GetMeta` until you complete the next step. + +### Step 6: Bootstrap KMS + +With KMS in onboard mode, trigger key generation by calling the Bootstrap RPC endpoint. This generates root keys, a TDX attestation quote, and writes `bootstrap-info.json`: + +```bash +# Replace kms.yourdomain.com with your actual KMS domain +curl -s -X POST \ + -H "Content-Type: application/json" \ + -d '{"domain":"kms.yourdomain.com"}' \ + "http://localhost:9100/prpc/Onboard.Bootstrap?json" | tee ~/kms-deploy/bootstrap-info.json | jq . +``` + +> **Note:** This uses plain `http://` — KMS is still in onboard mode (no TLS yet). The `tee` command saves the response to `bootstrap-info.json` while also displaying it. You'll need this file later to register KMS on-chain. + +Expected response: + +```json +{ + "ca_pubkey": "3059301306072a8648ce3d0201...", + "k256_pubkey": "0304c6bfe0ecd9bfa8b8c3450c...", + "attestation": "04000200810000000..." +} +``` + +Now signal KMS to exit onboard mode and start the main TLS service: + +```bash +curl -s "http://localhost:9100/finish" +``` + +Wait a few seconds for KMS to transition from onboard mode to the main TLS service: + +```bash +sleep 5 +``` + +### Step 7: Verify KMS is Running + +Test connectivity to the KMS RPC server (now using TLS): + +```bash +curl -sk https://localhost:9100/prpc/KMS.GetMeta?json | jq . +``` + +**Important:** Use `https://` — KMS now serves TLS after exiting onboard mode. + +Expected response: + +```json +{ + "ca_cert": "-----BEGIN CERTIFICATE-----...", + "allow_any_upgrade": false, + "k256_pubkey": "0304c6bfe0ecd9bfa8b8c3450c8fb49f52d6234522bd4e42c0736db852da8c871e", + "bootstrap_info": { + "ca_pubkey": "3059301306072a8648ce3d0201...", + "k256_pubkey": "0304c6bfe0ecd9bfa8b8c3450c...", + "attestation": "04000200810000000..." + }, + "is_dev": false, + "gateway_app_id": "", + "kms_contract_address": "0xe6c23bfE4686E28DcDA15A1996B1c0C549656E26", + "chain_id": 11155111, + "app_auth_implementation": "0xc308574F9A0c7d144d7AD887785D25C386D32B54" +} +``` + +Key fields to verify: +- `bootstrap_info`: Contains public keys and TDX attestation quote (not null) +- `bootstrap_info.attestation`: Non-empty — proves keys were generated in genuine TDX +- `ca_cert`: Root CA certificate was generated +- `k256_pubkey`: Ethereum signing key was generated +- `chain_id`: 11155111 indicates Sepolia testnet +- `kms_contract_address`: Your deployed KMS contract address + +### Step 8: Test Response Time + +Verify the RPC responds quickly (not hanging): + +```bash +time curl -sk https://localhost:9100/prpc/KMS.GetMeta?json > /dev/null +``` + +Expected: Response in < 1 second. If it takes > 10 seconds or hangs, see Troubleshooting section below. + +--- + +## Verifying TDX Attestation + +With KMS running in a CVM, the TDX quote provides cryptographic proof of integrity. + +### View the TDX Quote + +```bash +# Extract the attestation quote from bootstrap_info +curl -sk https://localhost:9100/prpc/KMS.GetMeta?json | jq -r '.bootstrap_info.attestation' +``` + +This returns a hex-encoded TDX quote. A non-empty value confirms KMS generated a valid attestation during bootstrap. + +### Quote Contents + +The TDX quote contains: +- **MRTD** - Measurement of the TDX environment +- **RTMR** - Runtime measurements +- **Report Data** - KMS public keys bound to the quote +- **Signature** - Intel's attestation signature + +### Verification Options + +The TDX quote can be verified by: + +1. **Intel PCCS** - Platform Configuration and Certification Service +2. **On-chain verification** - Smart contract quote validation +3. **Third-party services** - Independent attestation verification + +--- + +## Architecture + +### CVM-based KMS Architecture + +``` +┌─────────────────────────────────────────────────────────┐ +│ TDX Host │ +│ │ +│ ┌─────────────────────────────────────────────────┐ │ +│ │ dstack-vmm │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────┐ │ │ +│ │ │ KMS CVM (TDX Protected) │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────────────────────────────┐ │ │ │ +│ │ │ │ Docker Container │ │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ │ ┌─────────┐ ┌──────────────┐ │ │ │ │ +│ │ │ │ │ KMS │◄──│ auth-eth │ │ │ │ │ +│ │ │ │ └────┬────┘ └──────┬───────┘ │ │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ │ ▼ ▼ │ │ │ │ +│ │ │ │ /etc/kms/certs Ethereum RPC │ │ │ │ +│ │ │ └──────────────────────────────────┘ │ │ │ +│ │ │ │ │ │ +│ │ │ guest-agent (/var/run/dstack.sock) │ │ │ +│ │ └─────────────────────────────────────────┘ │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────┘ │ +│ │ +│ Port 9100 ◄─── External connections │ +└─────────────────────────────────────────────────────────┘ +``` + +### Key Differences from Host-based KMS + +| Aspect | Host-based KMS | CVM-based KMS | +|--------|----------------|---------------| +| TDX Attestation | Not available | Full attestation with quotes | +| Memory Protection | OS-level only | TDX hardware encryption | +| Key Security | File permissions | Hardware-protected memory | +| Verification | Physical security | Cryptographic proof | +| Deployment | systemd service | VMM-managed CVM | + +--- + +## Troubleshooting + +For detailed solutions, see the [KMS Deployment Troubleshooting Guide](/tutorial/troubleshooting-kms-deployment#kms-cvm-deployment-issues): + +- [CVM fails to start](/tutorial/troubleshooting-kms-deployment#cvm-fails-to-start) +- [CVM Exits Immediately or Reboots in a Loop](/tutorial/troubleshooting-kms-deployment#cvm-exits-immediately-or-reboots-in-a-loop) +- [Bootstrap hangs](/tutorial/troubleshooting-kms-deployment#bootstrap-hangs) +- [Port 9100 not accessible](/tutorial/troubleshooting-kms-deployment#port-9100-not-accessible) +- [TDX quote not generated](/tutorial/troubleshooting-kms-deployment#tdx-quote-not-generated) +- [CVM Fails with "QGS error code: 0x12001"](/tutorial/troubleshooting-kms-deployment#cvm-fails-with-qgs-error-code-0x12001) +- [GetMeta Returns "Connection refused" on Port 9200](/tutorial/troubleshooting-kms-deployment#getmeta-returns-connection-refused-on-port-9200) +- [GetMeta Returns "missing field `status`"](/tutorial/troubleshooting-kms-deployment#getmeta-returns-missing-field-status) +- [GetMeta Hangs or Times Out](/tutorial/troubleshooting-kms-deployment#getmeta-hangs-or-times-out) +- [CVM Hangs at "Waiting for time to be synchronized"](/tutorial/troubleshooting-kms-deployment#cvm-hangs-at-waiting-for-time-to-be-synchronized) + +--- + +## Certificate Persistence + +### Understanding Storage + +CVM certificates are stored in a Docker named volume (`kms-certs`). This provides: + +- **Container restart persistence** - Certificates survive container restarts +- **CVM restart consideration** - Depending on VMM configuration, volumes may or may not persist + +### Backup Recommendations + +After successful bootstrap, backup the bootstrap info: + +```bash +# Save bootstrap info (contains public keys and TDX attestation quote) +curl -sk https://localhost:9100/prpc/KMS.GetMeta?json | jq '.bootstrap_info' > ~/kms-bootstrap-info-$(date +%Y%m%d).json + +# The private keys remain inside the CVM for security +# For full backup, use the VMM console to export the CVM state +``` + +Store backup information securely offline. + +--- + +## Next Steps + +With KMS deployed as a CVM, proceed to set up the Gateway: + +- [Gateway Build & Configuration](/tutorial/gateway-build-configuration) - Build and configure the dstack gateway + +## Additional Resources + +- [Intel TDX Attestation](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html) +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) +- [Docker Compose Documentation](https://docs.docker.com/compose/) diff --git a/docs/tutorials/local-docker-registry.md b/docs/tutorials/local-docker-registry.md new file mode 100644 index 00000000..f3a0c811 --- /dev/null +++ b/docs/tutorials/local-docker-registry.md @@ -0,0 +1,205 @@ +--- +title: "Local Docker Registry" +description: "Deploy a local Docker registry behind HAProxy for reliable CVM image pulls" +section: "Prerequisites" +stepNumber: 4 +totalSteps: 7 +lastUpdated: 2026-01-22 +prerequisites: + - haproxy-setup + - ssl-certificate-setup +tags: + - docker + - registry + - haproxy + - prerequisites +difficulty: intermediate +estimatedTime: "20 minutes" +--- + +# Local Docker Registry + +This tutorial guides you through deploying a local Docker registry behind HAProxy. The registry runs on localhost:5000 and HAProxy handles TLS termination, providing secure external access via `registry.yourdomain.com`. + +## Why Local Registry? + +| Challenge | Solution | +|-----------|----------| +| Docker Hub rate limits | Local registry has no pull limits | +| Network reliability | Local pulls are fast and consistent | +| CVM boot timing | Registry must respond quickly during boot | +| Image availability | Cached images always available | + +When a CVM boots, it pulls Docker images. If this fails, the CVM fails to start. A local registry with proper SSL ensures reliable deployments. + +## Architecture Overview + +``` +External Request Internal +┌──────────────────────────────────────────────────────────────┐ +│ │ +│ registry.yourdomain.com:443 → HAProxy → localhost:5000 │ +│ (TLS) (proxy) (registry) │ +│ │ +└──────────────────────────────────────────────────────────────┘ +``` + +HAProxy handles: +- TLS termination using Let's Encrypt certificates +- SNI-based routing to the registry on localhost:5000 +- Unified configuration with other services (VMM management, gateway, etc.) + +## Prerequisites + +Before starting, ensure you have: + +- Completed [HAProxy Setup](/tutorial/haproxy-setup) - HAProxy installed and configured +- Completed [SSL Certificate Setup](/tutorial/ssl-certificate-setup) - Registry certificate obtained +- Docker installed and running + +Verify the DNS record: + +```bash +dig +short registry.yourdomain.com +``` + +Should return your server's IP address. + +--- + + +## Manual Deployment + +If you prefer to deploy manually, follow these steps. + +> **Note:** HAProxy and SSL certificates must already be set up. If you haven't completed [HAProxy Setup](/tutorial/haproxy-setup) and [SSL Certificate Setup](/tutorial/ssl-certificate-setup), do those first. HAProxy is already configured to proxy `registry.yourdomain.com` to `localhost:5000`. + +### Step 1: Create Registry Storage Directory + +```bash +sudo mkdir -p /var/lib/registry +``` + +### Step 2: Deploy Registry Container + +The registry runs on localhost:5000 (not exposed externally). HAProxy handles external TLS connections. + +```bash +docker run -d \ + --name registry \ + --restart always \ + -p 127.0.0.1:5000:5000 \ + -v /var/lib/registry:/var/lib/registry \ + registry:2 +``` + +### Step 3: Verify Registry is Running Locally + +```bash +docker ps | grep registry +``` + +Expected output shows container running: +``` +abc123 registry:2 ... Up 2 minutes 127.0.0.1:5000->5000/tcp registry +``` + +Test the registry API locally (without TLS): + +```bash +curl -s http://127.0.0.1:5000/v2/ +``` + +An empty response or `{}` indicates success - the registry is running. + +### Step 4: Verify External Access + +Test the registry through HAProxy: + +```bash +curl -s https://registry.yourdomain.com/v2/ +``` + +An empty response or `{}` indicates success. + +Check the catalog (empty initially): + +```bash +curl -s https://registry.yourdomain.com/v2/_catalog +``` + +Expected response: `{"repositories":[]}` (no images pushed yet) + +--- + +## About KMS Images + +The KMS Docker image is **built from source** and pushed to your local registry during Phase 4 (KMS Build & Configuration). This is handled by: + +- Follow the [KMS Build & Configuration](/tutorial/kms-build-configuration) tutorial. + +**Do not attempt to pull KMS images from Docker Hub.** The tutorial workflow builds everything from source to ensure you have a verifiable, reproducible deployment. + +### Verify Registry is Ready + +At this point, your registry should be running but empty: + +```bash +curl -sk https://registry.yourdomain.com/v2/_catalog +``` + +Expected response: +```json +{"repositories":[]} +``` + +Images will appear here after completing the KMS build phase. + +--- + +## Verification Summary + +Run this verification script: + +```bash +# Replace with your registry domain +DOMAIN="registry.yourdomain.com" + +echo "Registry Container: $(docker ps --format '{{.Names}}' | grep -q registry && echo 'running' || echo 'not running')" +echo "Local Port 5000: $(ss -tln | grep -q 127.0.0.1:5000 && echo 'listening' || echo 'not listening')" +echo "HAProxy Port 443: $(ss -tln | grep -q :443 && echo 'listening' || echo 'not listening')" +echo "SSL Certificate: $(openssl s_client -connect $DOMAIN:443 -servername $DOMAIN /dev/null | grep -q 'Verify return code: 0' && echo 'valid' || echo 'invalid or expired')" +echo "Local Registry: $(curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:5000/v2/ | grep -q '200' && echo 'responding' || echo 'not responding')" +echo "External via HAProxy: $(curl -s -o /dev/null -w '%{http_code}' https://$DOMAIN/v2/ | grep -q '200' && echo 'responding' || echo 'not responding')" +echo "Repositories: $(curl -s https://$DOMAIN/v2/_catalog)" +``` + +All checks should show positive status. The repositories list will be empty until you complete the KMS build phase. + +--- + +## Troubleshooting + +For detailed solutions, see the [Prerequisites Troubleshooting Guide](/tutorial/troubleshooting-prerequisites#local-docker-registry-issues): + +- [Certificate Verification Failed](/tutorial/troubleshooting-prerequisites#certificate-verification-failed) +- [503 Service Unavailable from HAProxy](/tutorial/troubleshooting-prerequisites#503-service-unavailable-from-haproxy) +- [502 Bad Gateway from HAProxy](/tutorial/troubleshooting-prerequisites#502-bad-gateway-from-haproxy) +- [DNS Not Resolving (Docker Registry)](/tutorial/troubleshooting-prerequisites#dns-not-resolving-docker-registry) +- [Registry Container Not Starting](/tutorial/troubleshooting-prerequisites#registry-container-not-starting) +- [HAProxy Configuration Error](/tutorial/troubleshooting-prerequisites#haproxy-configuration-error) + +--- + +## Next Steps + +With the local Docker registry running, proceed to: + +- [Contract Deployment](/tutorial/contract-deployment) - Deploy KMS contracts to Sepolia +- [KMS Build & Configuration](/tutorial/kms-build-configuration) - Prepare KMS for CVM deployment + +## Additional Resources + +- [Docker Registry Documentation](https://docs.docker.com/registry/) +- [Let's Encrypt Documentation](https://letsencrypt.org/docs/) +- [Certbot Documentation](https://certbot.eff.org/docs/) diff --git a/docs/tutorials/management-interface-setup.md b/docs/tutorials/management-interface-setup.md new file mode 100644 index 00000000..95d38110 --- /dev/null +++ b/docs/tutorials/management-interface-setup.md @@ -0,0 +1,156 @@ +--- +title: "Management Interface Setup" +description: "Configure secure remote access to dstack VMM management interface via HAProxy" +section: "dstack Installation" +stepNumber: 6 +totalSteps: 8 +lastUpdated: 2026-01-22 +prerequisites: + - vmm-service-setup + - haproxy-setup + - ssl-certificate-setup +tags: + - haproxy + - reverse-proxy + - tls + - management + - security +difficulty: intermediate +estimatedTime: "10 minutes" +--- + +# Management Interface Setup + +This tutorial guides you through verifying secure remote access to the dstack VMM management interface. By default, the VMM API listens on `127.0.0.1:9080`, which is only accessible from the server itself. HAProxy (configured in [HAProxy Setup](/tutorial/haproxy-setup)) proxies requests from `vmm.dstack.yourdomain.com` to the VMM API. + +## Architecture Overview + +``` +External Request Internal +┌─────────────────────────────────────────────────────────────────┐ +│ │ +│ vmm.dstack.yourdomain.com:443 → HAProxy → localhost:9080 │ +│ (TLS) (proxy) (VMM API) │ +│ │ +└─────────────────────────────────────────────────────────────────┘ +``` + +The VMM API requires authentication tokens, providing an additional layer of security beyond TLS. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [VMM Service Setup](/tutorial/vmm-service-setup) - VMM running on localhost:9080 +- Completed [HAProxy Setup](/tutorial/haproxy-setup) - HAProxy installed and configured +- Completed [SSL Certificate Setup](/tutorial/ssl-certificate-setup) - Wildcard certificate for `*.dstack.yourdomain.com` +- VMM authentication token (generated during VMM configuration) + +## Security Considerations + +### Authentication + +The VMM API requires an authentication token for all requests. This token was generated during [VMM Configuration](/tutorial/vmm-configuration) and saved to `~/.dstack/secrets/vmm-auth-token`. API requests include it via: + +```bash +curl -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" ... +``` + +### Firewall + +Ensure your firewall allows HTTPS traffic: + +```bash +# Check current rules +sudo ufw status + +# Allow HTTPS if needed +sudo ufw allow 443/tcp +``` + +--- + +## Verify HAProxy Configuration + +HAProxy is already configured to proxy VMM requests. Verify the configuration includes the VMM backend: + +```bash +grep -A5 "vmm_backend" /etc/haproxy/haproxy.cfg +``` + +Expected output shows the VMM backend configuration: + +``` +backend vmm_backend + mode http + option httpchk GET / + http-request set-header X-Forwarded-Proto https + server vmm 127.0.0.1:9080 check +``` + +## Verify Remote Access + +### Step 1: Test VMM is Running Locally + +```bash +curl -s http://127.0.0.1:9080/ | head -5 +``` + +Should return the VMM web interface HTML. + +### Step 2: Test External Access + +Test the management interface through HAProxy: + +```bash +# Replace with your domain +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "https://vmm.dstack.yourdomain.com/prpc/Status?json" | jq . +``` + +Expected response: + +```json +{ + "vms": [], + "port_mapping_enabled": true, + "total": 0 +} +``` + +> **Note:** The `vms` list will be empty until you deploy CVMs in later tutorials. The key point is that you get a valid JSON response through HAProxy, confirming TLS termination, routing, and VMM authentication are all working. + +### Step 3: Access Web Interface + +Open in your browser: + +``` +https://vmm.dstack.yourdomain.com +``` + +You should see the VMM Management Console. API requests require the auth token in the `Authorization` header. + +--- + +## Troubleshooting + +For detailed solutions, see the [dstack Installation Troubleshooting Guide](/tutorial/troubleshooting-dstack-installation#management-interface-setup-issues): + +- [502 Bad Gateway](/tutorial/troubleshooting-dstack-installation#502-bad-gateway) +- [Connection Refused](/tutorial/troubleshooting-dstack-installation#connection-refused) +- [DNS Not Resolving](/tutorial/troubleshooting-dstack-installation#dns-not-resolving) +- [Authentication Failed](/tutorial/troubleshooting-dstack-installation#authentication-failed) +- [Backend Marked as DOWN](/tutorial/troubleshooting-dstack-installation#backend-marked-as-down) + +--- + +## Next Steps + +With secure remote access configured, proceed to: + +- [Guest OS Image Setup](/tutorial/guest-image-setup) - Download and configure guest images + +## Additional Resources + +- [HAProxy Documentation](https://www.haproxy.org/documentation/) +- [Let's Encrypt Documentation](https://letsencrypt.org/docs/) diff --git a/docs/tutorials/rust-toolchain-installation.md b/docs/tutorials/rust-toolchain-installation.md new file mode 100644 index 00000000..c2a3572e --- /dev/null +++ b/docs/tutorials/rust-toolchain-installation.md @@ -0,0 +1,124 @@ +--- +title: "Rust Toolchain Installation" +description: "Install and configure the Rust programming language toolchain for building dstack components" +section: "dstack Installation" +stepNumber: 2 +totalSteps: 8 +lastUpdated: 2025-12-07 +prerequisites: + - system-baseline-dependencies +tags: + - rust + - cargo + - rustup + - toolchain +difficulty: "beginner" +estimatedTime: "10 minutes" +--- + +# Rust Toolchain Installation + +This tutorial guides you through installing the Rust programming language toolchain, which is required for building dstack components. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [System Baseline & Dependencies](/tutorial/system-baseline-dependencies) +- SSH access to your TDX-enabled server + + +## What Gets Installed + +| Component | Purpose | +|-----------|---------| +| `rustup` | Rust toolchain installer and version manager | +| `rustc` | Rust compiler | +| `cargo` | Rust package manager and build tool | +| `clippy` | Rust linter for catching common mistakes | +| `rustfmt` | Rust code formatter | + +--- + +## Manual Installation + +If you prefer to install Rust manually, follow these steps. + +### Step 1: Connect to Your Server + +```bash +ssh ubuntu@YOUR_SERVER_IP +``` + +All commands should be run as the `ubuntu` user (not root). Rust will be installed in your home directory at `~/.cargo` and `~/.rustup`. + +### Step 2: Install rustup + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y +``` + +The `-y` flag accepts default options: +- Installs the stable toolchain +- Adds cargo to your PATH +- Sets up shell configuration + +### Step 3: Load the Environment + +```bash +source $HOME/.cargo/env +``` + +### Step 4: Install Additional Components + +```bash +rustup component add clippy rustfmt +``` + +### Step 5: Verify Installation + +```bash +rustc --version +cargo --version +rustup --version +``` + +Expected output (versions may vary): +``` +rustc 1.82.0 (f6e511eec 2024-10-15) +cargo 1.82.0 (8f40fc59f 2024-08-21) +rustup 1.27.1 (54dd3d00f 2024-04-24) +``` + +### Step 6: Test Compilation + +```bash +cargo new --bin rust-test && cd rust-test && cargo run && cd ~ && rm -rf rust-test +``` + +You should see "Hello, world!" printed. + +--- + +## Troubleshooting + +For detailed solutions, see the [dstack Installation Troubleshooting Guide](/tutorial/troubleshooting-dstack-installation#rust-toolchain-installation-issues): + +- [rustup command not found](/tutorial/troubleshooting-dstack-installation#rustup-command-not-found) +- [Permission denied errors](/tutorial/troubleshooting-dstack-installation#permission-denied-errors) +- [Network timeout during installation](/tutorial/troubleshooting-dstack-installation#network-timeout-during-installation) +- [Updating Rust](/tutorial/troubleshooting-dstack-installation#updating-rust) + +--- + +## Next Steps + +With Rust installed, proceed to: + +- [Clone & Build dstack-vmm](/tutorial/clone-build-dstack-vmm) - Build the dstack virtual machine manager + +## Additional Resources + +- [The Rust Programming Language Book](https://doc.rust-lang.org/book/) +- [Rust by Example](https://doc.rust-lang.org/rust-by-example/) +- [rustup Documentation](https://rust-lang.github.io/rustup/) diff --git a/docs/tutorials/ssl-certificate-setup.md b/docs/tutorials/ssl-certificate-setup.md new file mode 100644 index 00000000..c44d5cc8 --- /dev/null +++ b/docs/tutorials/ssl-certificate-setup.md @@ -0,0 +1,283 @@ +--- +title: "SSL Certificate Setup" +description: "Obtain Let's Encrypt SSL certificates for dstack services" +section: "Prerequisites" +stepNumber: 2 +totalSteps: 7 +lastUpdated: 2025-12-09 +prerequisites: + - dns-configuration +tags: + - ssl + - certificates + - letsencrypt + - https + - prerequisites +difficulty: intermediate +estimatedTime: "20 minutes" +--- + +# SSL Certificate Setup + +This tutorial guides you through obtaining SSL certificates from Let's Encrypt for your dstack deployment. These certificates enable HTTPS for the local Docker registry and other services. + +## What You'll Configure + +| Certificate | Used By | Domain Example | +|-------------|---------|----------------| +| Registry certificate | Local Docker registry | `registry.yourdomain.com` | +| Gateway wildcard | dstack Gateway | `*.dstack.yourdomain.com` | + +This tutorial covers both the **registry certificate** (for the Docker registry) and the **gateway wildcard certificate** (for application subdomains). + +## Prerequisites + +Before starting, ensure you have: + +- Completed [DNS Configuration](/tutorial/dns-configuration) - DNS records must exist +- Domain pointing to your server (verified via `dig`) +- Port 80 accessible for Let's Encrypt HTTP-01 challenge +- SSH access to your TDX server + +### Verify DNS Resolution + +```bash +# Replace with your domain +dig +short registry.yourdomain.com +``` + +Should return your server's IP address. If not, the certificate request will fail. + +--- + + +## Manual Setup + +If you prefer to configure manually, follow these steps. + +### Step 1: Install Certbot + +```bash +sudo apt update +sudo apt install -y certbot +``` + +Verify installation: + +```bash +certbot --version +``` + +### Step 2: Stop Services Using Port 80 + +Let's Encrypt's HTTP-01 challenge requires port 80. Stop any services using it: + +```bash +# Check what's using port 80 +sudo ss -tlnp | grep :80 + +# Stop HAProxy if running (or nginx on older setups) +sudo systemctl stop haproxy 2>/dev/null || true +sudo systemctl stop nginx 2>/dev/null || true + +# Stop apache if running +sudo systemctl stop apache2 2>/dev/null || true +``` + +### Step 3: Obtain Registry Certificate + +Request a certificate for your registry domain: + +```bash +sudo certbot certonly --standalone \ + -d registry.yourdomain.com \ + --non-interactive \ + --agree-tos \ + --email your-email@example.com +``` + +**Replace:** +- `registry.yourdomain.com` with your actual registry domain +- `your-email@example.com` with your email (for expiry notifications) + +**Expected output:** + +``` +Successfully received certificate. +Certificate is saved at: /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem +Key is saved at: /etc/letsencrypt/live/registry.yourdomain.com/privkey.pem +``` + +### Step 4: Verify Certificate + +Check the certificate is valid: + +```bash +sudo openssl x509 -in /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem -text -noout | \ + grep -E "(Subject:|Not After)" +``` + +Expected output shows your domain and expiry date (90 days from now): + +``` + Subject: CN = registry.yourdomain.com + Not After : Apr 21 12:00:00 2026 GMT +``` + +HAProxy uses these certificates via combined PEM files in `/etc/haproxy/certs/` (created by the renewal hook). + +--- + +## Certificate Auto-Renewal + +Let's Encrypt certificates expire after 90 days. Certbot sets up automatic renewal. + +### Verify Auto-Renewal Timer + +```bash +systemctl status certbot.timer +``` + +Should show the timer is active and running. + +### Test Renewal Process + +```bash +sudo certbot renew --dry-run +``` + +Should complete without errors. + +### Set Up Renewal Hook for HAProxy + +When certificates renew, HAProxy needs updated combined PEM files and a reload: + +```bash +sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-haproxy.sh > /dev/null << 'EOF' +#!/bin/bash +# Reload HAProxy certificates after Let's Encrypt renewal + +# Combine certificates for HAProxy (cert + key in single file) +cat /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/registry.yourdomain.com/privkey.pem \ + > /etc/haproxy/certs/registry.pem + +# Wildcard cert if it exists +if [ -f /etc/letsencrypt/live/dstack.yourdomain.com/fullchain.pem ]; then + cat /etc/letsencrypt/live/dstack.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/dstack.yourdomain.com/privkey.pem \ + > /etc/haproxy/certs/wildcard.pem +fi + +chmod 600 /etc/haproxy/certs/*.pem + +systemctl reload haproxy + +echo "HAProxy certificates updated: $(date)" +EOF + +sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-haproxy.sh +``` + +**Update the domain names** in the script to match your actual domains. + +HAProxy requires certificates in a combined format (cert + key in one file), so the renewal hook concatenates them. + +--- + +## Gateway Wildcard Certificate (Optional) + +The dstack Gateway requires a wildcard certificate for automatic subdomain provisioning. This uses DNS-01 challenge with Cloudflare: + +```bash +# Install Cloudflare plugin +sudo apt install -y python3-certbot-dns-cloudflare + +# Create credentials file +sudo mkdir -p /etc/cloudflare +sudo tee /etc/cloudflare/credentials.ini > /dev/null << EOF +dns_cloudflare_api_token = YOUR_CLOUDFLARE_API_TOKEN +EOF +sudo chmod 600 /etc/cloudflare/credentials.ini + +# Obtain wildcard certificate +sudo certbot certonly --dns-cloudflare \ + --dns-cloudflare-credentials /etc/cloudflare/credentials.ini \ + -d "*.dstack.yourdomain.com" \ + -d "dstack.yourdomain.com" \ + --non-interactive \ + --agree-tos \ + --email your-email@example.com +``` + +The certificate will be used by the gateway in the [Gateway Build & Configuration](/tutorial/gateway-build-configuration) tutorial. + +--- + +## Verification Summary + +Verify your SSL certificate setup: + +```bash +# Check certbot installed +certbot --version + +# Check auto-renewal timer is active +systemctl is-active certbot.timer + +# List all certificates +sudo certbot certificates +``` + +### Registry Certificate + +```bash +# Check certificate exists (replace with your domain) +sudo ls -la /etc/letsencrypt/live/registry.yourdomain.com/ + +# Check certificate validity +sudo openssl x509 -in /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem -noout -dates +``` + +HAProxy uses combined PEM files in `/etc/haproxy/certs/` which are updated by the renewal hook. + +### Gateway Wildcard Certificate + +```bash +# Check wildcard certificate exists (replace with your domain) +sudo ls -la /etc/letsencrypt/live/dstack.yourdomain.com/ + +# Check certificate covers wildcard +sudo openssl x509 -in /etc/letsencrypt/live/dstack.yourdomain.com/fullchain.pem -noout -text | grep -A1 "Subject Alternative Name" +``` + +Should show both `*.dstack.yourdomain.com` and `dstack.yourdomain.com`. + +Save as `verify-ssl.sh`, update `DOMAIN`, make executable with `chmod +x verify-ssl.sh`, and run. + +--- + +## Troubleshooting + +For detailed solutions, see the [Prerequisites Troubleshooting Guide](/tutorial/troubleshooting-prerequisites#ssl-certificate-setup-issues): + +- [Challenge Failed: Could not connect](/tutorial/troubleshooting-prerequisites#challenge-failed-could-not-connect) +- [Rate Limit Exceeded](/tutorial/troubleshooting-prerequisites#rate-limit-exceeded) +- [DNS Resolution Failed](/tutorial/troubleshooting-prerequisites#dns-resolution-failed) +- [HAProxy Can't Read Certificates](/tutorial/troubleshooting-prerequisites#haproxy-cant-read-certificates) + +--- + +## Next Steps + +With SSL certificates configured, proceed to: + +- [HAProxy Setup](/tutorial/haproxy-setup) - Configure HAProxy as TLS entry point +- [Gramine Key Provider](/tutorial/gramine-key-provider) - Deploy SGX-based key provider +- [Local Docker Registry](/tutorial/local-docker-registry) - Uses these certificates + +## Additional Resources + +- [Let's Encrypt Documentation](https://letsencrypt.org/docs/) +- [Certbot Documentation](https://certbot.eff.org/docs/) +- [Cloudflare DNS Plugin](https://certbot-dns-cloudflare.readthedocs.io/) diff --git a/docs/tutorials/system-baseline-dependencies.md b/docs/tutorials/system-baseline-dependencies.md new file mode 100644 index 00000000..ee5d72b9 --- /dev/null +++ b/docs/tutorials/system-baseline-dependencies.md @@ -0,0 +1,118 @@ +--- +title: "System Baseline & Dependencies" +description: "Update the host system and install required build dependencies for dstack" +section: "dstack Installation" +stepNumber: 1 +totalSteps: 8 +lastUpdated: 2025-12-07 +prerequisites: + - tdx-bios-configuration +tags: + - host-setup + - dependencies + - build-tools + - system-update +difficulty: beginner +estimatedTime: 10-15 minutes +--- + +# System Baseline & Dependencies + +Before building dstack components, you need to prepare the host system with updated packages and required build dependencies. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [TDX BIOS Configuration](/tutorial/tdx-bios-configuration) +- SSH access to your TDX-enabled server +- Root or sudo privileges + + +## What Gets Installed + +| Package | Purpose | +|---------|---------| +| `build-essential` | GCC compiler, make, and essential build tools | +| `chrpath` | Modify rpath in ELF binaries | +| `diffstat` | Produce histogram of diff output | +| `lz4` | Fast compression algorithm | +| `wireguard-tools` | WireGuard VPN utilities for secure networking | +| `xorriso` | ISO 9660 filesystem tool for guest images | +| `git` | Version control for cloning dstack repository | +| `curl` | HTTP client for downloading files | +| `pkg-config` | Helper tool for compiling applications | +| `libssl-dev` | SSL development libraries | + + +--- + +## Manual Installation + +If you prefer to install dependencies manually, follow these steps. + +### Step 1: Connect to Your Server + +```bash +ssh ubuntu@YOUR_SERVER_IP +``` + +### Step 2: Update System Packages + +```bash +sudo apt update && sudo apt upgrade -y +``` + +This may take a few minutes. If prompted about kernel updates or service restarts, accept the defaults. + +### Step 3: Install Build Dependencies + +```bash +sudo apt install -y \ + build-essential \ + chrpath \ + diffstat \ + lz4 \ + wireguard-tools \ + xorriso \ + git \ + curl \ + pkg-config \ + libssl-dev +``` + +### Step 4: Verify Installations + +```bash +# Check compiler +gcc --version + +# Check make +make --version + +# Check git +git --version + +# Check additional tools +wg --version +xorriso --version +lz4 --version +``` + +--- + +## Troubleshooting + +For detailed solutions, see the [dstack Installation Troubleshooting Guide](/tutorial/troubleshooting-dstack-installation#system-baseline-dependencies-issues): + +- [Package Installation Fails](/tutorial/troubleshooting-dstack-installation#package-installation-fails) +- [OpenMetal Grub Error](/tutorial/troubleshooting-dstack-installation#openmetal-grub-error) +- [Kernel Upgrade Prompts](/tutorial/troubleshooting-dstack-installation#kernel-upgrade-prompts) + +--- + +## Next Steps + +With system dependencies installed, proceed to: + +- [Rust Toolchain Installation](/tutorial/rust-toolchain-installation) - Install Rust and Cargo for building dstack components diff --git a/docs/tutorials/tdx-bios-configuration.md b/docs/tutorials/tdx-bios-configuration.md new file mode 100644 index 00000000..c6aff1be --- /dev/null +++ b/docs/tutorials/tdx-bios-configuration.md @@ -0,0 +1,145 @@ +--- +title: "TDX & SGX BIOS Configuration" +description: "Configure BIOS settings for TDX and SGX, including Auto MP Registration for KMS attestation" +section: "Host Setup" +stepNumber: 2 +totalSteps: 4 +prerequisites: + - tdx-hardware-verification +tags: + - tdx + - sgx + - bios + - configuration + - tme + - attestation +difficulty: "intermediate" +estimatedTime: "20 minutes" +lastUpdated: 2025-12-07 +--- + +# TDX & SGX BIOS Configuration + +This tutorial covers configuring BIOS settings to enable both TDX (Trust Domain Extensions) and SGX (Software Guard Extensions). Both are required for running dstack with KMS attestation. + +## Why Both TDX and SGX? + +| Technology | Purpose | +|------------|---------| +| **TDX** | Provides hardware-isolated virtual machines (Trust Domains) with encrypted memory | +| **SGX** | Required for KMS attestation - generates cryptographic quotes proving your platform is genuine Intel hardware | + +**Important:** SGX Auto MP Registration must be enabled for the KMS to bootstrap with the local key provider. Without this, KMS cannot generate valid attestation quotes. + +## Access BIOS/UEFI + +You'll need to access your server's BIOS setup utility. + +### Option 1: IPMI/BMC (Remote Management) + +Most servers have remote management interfaces: + +- Dell: iDRAC +- HP: iLO +- Supermicro: IPMI +- Lenovo: XClarity +- OpenMetal: Central Dashboard → IPMI Console + +Access the web interface and use the remote console/KVM feature, or use CLI: + +```bash +# Example with ipmitool (if you have IPMI credentials) +ipmitool -I lanplus -H YOUR_BMC_IP -U admin -P password sol activate +``` + +### Option 2: Physical Access + +1. Reboot server +2. Press appropriate key during POST: + - Dell: F2 + - HP: F9 or F10 + - Supermicro: Delete + - Most others: F2 or Delete + +## Required BIOS Settings + +Configure all settings in a single BIOS session to avoid multiple reboots. + +### Step 0: Disable Physical Address Limit (IMPORTANT!) + +**Before enabling TME-MT, you must first disable the CPU physical address limit.** + +Navigate to: **Advanced → CPU Configuration** (or **Processor Configuration**) + +| Setting | Value | Notes | +|---------|-------|-------| +| **Limit CPU Physical Address to 46 bits** | **Disabled** | May also be labeled "Physical Address Limit" or "Hyper-V Physical Address Limit" | + +> **Why this matters:** The 46-bit address limit prevents TME-MT from working. Intel MKTME needs the upper address bits for encryption key IDs. If you don't disable this first, TME-MT will be greyed out and unselectable. + +> **Note:** If this setting doesn't exist on your system, it may already be disabled or not applicable. Proceed to the next step. + +### Step 1: Memory Encryption Settings + +Navigate to: **Advanced → CPU Configuration → Memory Encryption** (or similar path) + +| Setting | Value | Notes | +|---------|-------|-------| +| **Total Memory Encryption (TME)** | Enabled | Base memory encryption | +| **Total Memory Encryption Multi-Tenant (TME-MT)** | Enabled | Multi-key encryption for TDX | +| **TME-MT Memory Integrity** | **Disabled** | Impacts performance if enabled | +| **TME-MT/TDX Key Split** | 1 (or higher) | Allocates keys for TDX | + +### Step 2: Intel TDX Settings + +Navigate to: **Advanced → CPU Configuration** (may be under Security submenu) + +| Setting | Value | Notes | +|---------|-------|-------| +| **Trust Domain Extension (TDX)** | Enabled | Main TDX enable | +| **TDX Secure Arbitration Mode Loader (SEAM Loader)** | Enabled | Required for TDX module | + +After enabling, you should see key allocation information: + +- **TME-MT Keys:** 31 (or similar) +- **TDX Keys:** 32 (or similar) + +### Step 3: Intel SGX Settings (REQUIRED for KMS) + +**SGX is required for KMS attestation**, even on TDX systems. The KMS uses SGX to generate attestation quotes that prove your platform is genuine Intel hardware registered with Intel's Provisioning Certification Service. + +Navigate to: **Advanced → CPU Configuration → Software Guard Extension (SGX)** + +Enable these settings: + +| Setting | Value | Notes | +|---------|-------|-------| +| **SW Guard Extensions (SGX)** | Enabled | Main SGX enable | +| **SGX Auto MP Registration** | **Enabled** | **CRITICAL** - Registers platform with Intel | +| SGX Factory Reset | Disabled | Don't reset SGX keys | +| **SGX QoS** | Enabled | Quality of Service | +| **PRM Size for SGX** | Auto | Memory allocation (or specific size) | +| **Select Owner EPOCH Input Type** | SGX Owner EPOCH activated | | +| **SGXLEPUBKEYHASHx Write Enable** | Enabled | Allows launch enclave configuration | + +> **Why SGX Auto MP Registration is critical:** This setting enables automatic registration of your platform with Intel's Provisioning Certification Service (PCS). On first boot after enabling this setting, your system will register with Intel and obtain Platform Certification Keys (PCKs). Without this registration, the KMS cannot generate valid attestation quotes, and the local_key_provider will fail to bootstrap. + +### Step 4: Save and Exit + +1. Press **F4** (or navigate to Save & Exit) +2. Confirm save changes +3. System will reboot + +> **Having trouble?** See [Host Setup Troubleshooting](/tutorial/troubleshooting-host-setup#tdx-bios-configuration-issues) for common BIOS configuration issues like greyed-out options or settings not persisting. + +## Next Steps + +After saving BIOS settings and rebooting, continue to: + +- [TDX Software Installation](/tutorial/tdx-software-installation) - Install the TDX kernel and software stack + +## Additional Resources + +- [Intel TDX Documentation](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html) +- [Intel SGX Documentation](https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html) +- [Canonical TDX Repository](https://github.com/canonical/tdx) diff --git a/docs/tutorials/tdx-hardware-verification.md b/docs/tutorials/tdx-hardware-verification.md new file mode 100644 index 00000000..dc16267a --- /dev/null +++ b/docs/tutorials/tdx-hardware-verification.md @@ -0,0 +1,128 @@ +--- +title: "TDX Hardware Verification" +description: "Verify your hardware supports Intel TDX and check memory configuration requirements" +section: "Host Setup" +stepNumber: 1 +totalSteps: 4 +lastUpdated: 2025-12-07 + +tags: + - "tdx" + - "hardware" + - "verification" + - "confidential-computing" +difficulty: "intermediate" +estimatedTime: "15 minutes" +--- + +# TDX Hardware Verification + +This tutorial walks you through verifying your hardware supports Intel Trust Domain Extensions (TDX). TDX is Intel's hardware-based confidential computing technology that allows you to run trusted execution environments (TEEs) for secure, isolated workloads. + +## What is Intel TDX? + +Intel TDX (Trust Domain Extensions) is a hardware-based technology that creates isolated virtual machine environments called Trust Domains (TDs). These TDs provide: + +- **Hardware-level isolation** - VMs are isolated from the hypervisor and other VMs +- **Memory encryption** - All TD memory is encrypted with per-TD keys +- **Remote attestation** - Cryptographic proof of TD integrity +- **Minimal TCB** - Reduced trusted computing base for better security + +## Prerequisites Check + +Before beginning, verify your hardware supports TDX: + +### Supported Processors + +Intel TDX is available on: +- **Intel Xeon Scalable (5th Gen)** - Emerald Rapids (2024+) +- **Intel Xeon Scalable (4th Gen)** - Sapphire Rapids (some SKUs) + +#### Verify TDX Support on Intel ARK + +Before beginning, **verify your specific processor model supports TDX** using Intel ARK: + +1. Visit **https://ark.intel.com** +2. Search for your processor model (e.g., "Xeon Gold 6530") +3. Scroll down to **Security & Reliability** section +4. Look for: **Intel® Trust Domain Extensions (Intel® TDX)** → **Yes** + +**Example for Intel Xeon Gold 6530:** +- TDX Support: **Yes** ✓ +- Generation: 5th Gen (Emerald Rapids) +- Release Date: Q1 2024 + +#### Check Your Current Processor + +Check your CPU model: + +```bash +grep "model name" /proc/cpuinfo | head -1 +``` + +**Example output:** +``` +model name : INTEL(R) XEON(R) GOLD 6530 +``` + +The Intel Xeon Gold 6530 is a 5th generation processor (Emerald Rapids), which **does support TDX**. + +**Note:** Not all Xeon processors support TDX. Always verify on Intel ARK before proceeding. + +### Supported Operating Systems + +This tutorial covers: +- **Ubuntu 24.04 LTS (Noble)** - Recommended +- Ubuntu 25.04 (Plucky) - Also supported + +**Note:** Ubuntu 24.10 (Oracular) and 23.10 (Mantic) are no longer supported by Canonical's TDX PPA. + +### Memory Configuration Requirements + +**CRITICAL:** Intel TDX has specific memory configuration requirements that must be met: + +#### Memory Channel Requirements + +According to Intel's TDX Enabling Guide, your server must have: + +- **Minimum:** Memory populated in at least **2 channels per socket** +- **Recommended:** Memory populated in **all available channels** for best performance +- **Configuration:** DIMMs should be identical (same capacity, speed, manufacturer) + +**Example valid configurations:** +- ✓ 2 DIMMs per socket (minimum) +- ✓ 4 DIMMs per socket (better) +- ✓ 8 DIMMs per socket (optimal for most systems) + +**Invalid configurations:** +- ✗ Single DIMM per socket +- ✗ Mixed DIMM capacities or speeds +- ✗ Asymmetric channel population + +#### Verify Your Memory Configuration + +Check your current memory configuration: + +```bash +sudo dmidecode -t memory | grep -E "Size:|Locator:|Speed:|Type:" +``` + +**For detailed memory requirements, refer to:** +https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/03/hardware_selection/ + +**Important:** If your memory configuration doesn't meet these requirements, TDX may fail to initialize even after proper BIOS configuration. Consult with your server vendor if you need to adjust memory configuration. + +## Next Steps + +Once you've verified your hardware meets all TDX requirements: +- Processor supports TDX (verified on Intel ARK) +- Ubuntu 24.04 LTS installed +- Memory configuration meets requirements (minimum 2 channels per socket) + +You're ready to proceed to [TDX & SGX BIOS Configuration](/tutorial/tdx-bios-configuration) where you'll configure BIOS settings for TDX and SGX. + +## Additional Resources + +- **Intel ARK (Processor Verification):** https://ark.intel.com +- **Intel TDX Enabling Guide:** https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/ +- **Canonical TDX Documentation:** https://github.com/canonical/tdx diff --git a/docs/tutorials/tdx-sgx-verification.md b/docs/tutorials/tdx-sgx-verification.md new file mode 100644 index 00000000..2fc2ef4c --- /dev/null +++ b/docs/tutorials/tdx-sgx-verification.md @@ -0,0 +1,263 @@ +--- +title: "TDX & SGX Verification" +description: "Verify TDX and SGX are properly enabled and registered with Intel" +section: "Host Setup" +stepNumber: 4 +totalSteps: 4 +lastUpdated: 2025-12-07 +prerequisites: + - tdx-software-installation +tags: + - tdx + - sgx + - verification + - attestation +difficulty: "beginner" +estimatedTime: "15 minutes" +--- + +# TDX & SGX Verification + +This tutorial verifies that TDX and SGX are properly enabled after BIOS configuration and software installation. Both technologies must be working for dstack KMS attestation. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [TDX Software Installation](/tutorial/tdx-software-installation) +- Rebooted into the TDX-enabled kernel +- SSH access to the server + + +## Manual Verification + +If you prefer to verify manually, or need to troubleshoot specific issues, follow the steps below. + +## Part 1: Verify TDX Kernel + +First, confirm you're running the TDX-enabled kernel. + +### Check Kernel Version + +```bash +uname -r +``` + +**Expected output:** + +``` +6.8.0-1028-intel +``` + +The kernel version should contain `intel`. If you see `generic`, the system didn't boot into the TDX kernel - check GRUB configuration. + +## Part 2: Verify Memory Encryption (TME/MKTME) + +TDX requires Total Memory Encryption (TME) to be enabled. + +### Check TME Status + +```bash +sudo dmesg | grep -i tme +``` + +**Expected output:** + +``` +[ 0.000000] x86/tme: enabled by BIOS +[ 0.000000] x86/mktme: enabled by BIOS +[ 0.000000] x86/mktme: 63 KeyIDs available +``` + +**What this means:** + +| Message | Meaning | +|---------|---------| +| `x86/tme: enabled by BIOS` | Base memory encryption is active | +| `x86/mktme: enabled by BIOS` | Multi-Key TME (TME-MT) is active | +| `63 KeyIDs available` | 63 encryption keys for Trust Domains | + +**If TME is not enabled:** + +``` +[ 0.000000] x86/tme: not enabled by BIOS +``` + +This means BIOS configuration is incomplete. Return to [TDX & SGX BIOS Configuration](/tutorial/tdx-bios-configuration). + +## Part 3: Verify TDX Module + +Check that the TDX module initialized successfully. + +### Check TDX Initialization + +```bash +sudo dmesg | grep -i tdx +``` + +**Expected output:** + +``` +[ 58.680744] virt/tdx: BIOS enabled: private KeyID range [32, 64) +[ 58.681739] virt/tdx: Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3. +[ 245.715035] virt/tdx: TDX module: attributes 0x0, vendor_id 0x8086, major_version 1, minor_version 5, build_date 20240725, build_num 784 +[ 245.715041] virt/tdx: CMR: [0x100000, 0x77800000) +[ 245.715044] virt/tdx: CMR: [0x100000000, 0x407a000000) +... +[ 249.751098] virt/tdx: 4202516 KB allocated for PAMT +[ 249.751110] virt/tdx: module initialized +``` + +**Key indicators:** + +| Message | Meaning | +|---------|---------| +| `BIOS enabled: private KeyID range` | TDX is enabled in BIOS | +| `TDX module: ... major_version 1` | TDX module loaded | +| `CMR: [...]` | Convertible Memory Regions configured | +| `PAMT allocated` | Physical Address Metadata Table ready | +| `module initialized` | TDX is fully operational | + +**If TDX output is empty:** BIOS configuration is incomplete or the kernel doesn't have TDX support. + +### Check KVM TDX Parameter + +```bash +cat /sys/module/kvm_intel/parameters/tdx +``` + +**Expected output:** + +``` +Y +``` + +- `Y` = TDX is enabled in KVM +- `N` = TDX is not enabled (BIOS or kernel issue) + +### Check TDX CPU Flags + +```bash +grep -o 'tdx[^ ]*' /proc/cpuinfo | sort -u +``` + +**Expected output:** + +``` +tdx_host_platform +tdx_pw_mce +``` + +**What the flags mean:** + +| Flag | Meaning | +|------|---------| +| `tdx_host_platform` | System is running as TDX host (correct!) | +| `tdx_pw_mce` | TDX Power Management and Machine Check support | + +> **Note:** The `tdx_guest` flag only appears inside TDX guest VMs, not on the host. + +## Part 4: Verify SGX + +SGX is required for KMS attestation. The KMS uses SGX to generate quotes proving your platform is genuine Intel hardware. + +### Check SGX Devices + +```bash +ls -la /dev/sgx* +``` + +**Expected output:** + +``` +crw-rw---- 1 root sgx 10, 125 Dec 7 10:30 /dev/sgx_enclave +crw------- 1 root root 10, 126 Dec 7 10:30 /dev/sgx_provision +crw-rw---- 1 root sgx 10, 124 Dec 7 10:30 /dev/sgx_vepc +``` + +**Device purposes:** + +| Device | Purpose | +|--------|---------| +| `/dev/sgx_enclave` | Create and run SGX enclaves | +| `/dev/sgx_provision` | Provision attestation keys | +| `/dev/sgx_vepc` | Virtual EPC for SGX VMs | + +**If devices are missing:** SGX is not enabled in BIOS. Return to [TDX & SGX BIOS Configuration](/tutorial/tdx-bios-configuration). + +### Check SGX CPU Flags + +```bash +grep -o 'sgx[^ ]*' /proc/cpuinfo | sort -u +``` + +**Expected output:** + +``` +sgx +sgx_lc +``` + +**Flag meanings:** + +| Flag | Meaning | +|------|---------| +| `sgx` | SGX is supported | +| `sgx_lc` | SGX Launch Control is available | + +### Check SGX Kernel Messages + +```bash +sudo dmesg | grep -i sgx +``` + +**Expected output:** + +``` +[ 0.428531] sgx: EPC section 0x1020c00000-0x107fffffff +[ 0.428535] sgx: EPC section 0x2020c00000-0x207fffffff +``` + +This shows SGX Enclave Page Cache (EPC) memory is allocated. + +## Verification Summary + +Run this command for a quick status check: + +```bash +echo "=== TDX & SGX Verification Summary ===" && \ +echo && \ +echo "Kernel: $(uname -r)" && \ +echo && \ +echo "TME Status:" && \ +sudo dmesg | grep -i "x86/tme" | head -1 && \ +echo && \ +echo "TDX Status:" && \ +(cat /sys/module/kvm_intel/parameters/tdx 2>/dev/null && echo " (KVM TDX enabled)") || echo "N (KVM TDX not available)" && \ +echo && \ +echo "SGX Devices:" && \ +ls /dev/sgx* 2>/dev/null || echo "Not found" +``` + +**All checks should pass before proceeding to dstack deployment.** + +## Troubleshooting + +For detailed solutions, see the [Host Setup Troubleshooting Guide](/tutorial/troubleshooting-host-setup#tdx--sgx-verification-issues): + +- [TDX not enabled (dmesg empty)](/tutorial/troubleshooting-host-setup#tdx-not-enabled-dmesg-empty) +- [SGX devices missing](/tutorial/troubleshooting-host-setup#sgx-devices-missing) +- [KVM TDX parameter is N](/tutorial/troubleshooting-host-setup#kvm-tdx-parameter-is-n) + +## Next Steps + +With TDX and SGX verified, you're ready to proceed with dstack deployment: + +- [System Baseline Dependencies](/tutorial/system-baseline-dependencies) - Install system dependencies +- [Rust Toolchain Installation](/tutorial/rust-toolchain-installation) - Install Rust for building dstack + +## Additional Resources + +- [Intel TDX Documentation](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html) +- [Intel SGX Documentation](https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html) +- [Canonical TDX Repository](https://github.com/canonical/tdx) diff --git a/docs/tutorials/tdx-software-installation.md b/docs/tutorials/tdx-software-installation.md new file mode 100644 index 00000000..0a84a000 --- /dev/null +++ b/docs/tutorials/tdx-software-installation.md @@ -0,0 +1,272 @@ +--- +title: "TDX Software Installation" +description: "Install Canonical's TDX software stack, kernel, and attestation components" +section: "Host Setup" +stepNumber: 3 +totalSteps: 4 +lastUpdated: 2025-12-07 +prerequisites: + - tdx-bios-configuration +tags: + - tdx + - software + - kernel + - installation + - attestation +difficulty: "intermediate" +estimatedTime: "20 minutes" +--- + +# TDX Software Installation + +This tutorial guides you through installing Canonical's TDX software stack, including the TDX-enabled kernel, QEMU, libvirt, and attestation components. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [TDX & SGX BIOS Configuration](/tutorial/tdx-bios-configuration) +- Ubuntu 24.04 LTS freshly installed +- Internet connectivity for package downloads + + +## Manual Installation + +If you prefer to install manually, or need to set up the ubuntu user first, follow the steps below. + +## Set Up Ubuntu User + +For this setup, you'll need an `ubuntu` user with passwordless sudo. If you're logged in as root or another user: + +```bash +# Create ubuntu user (skip if already exists) +sudo adduser ubuntu + +# Add to sudo group +sudo usermod -aG sudo ubuntu + +# Configure passwordless sudo +echo 'ubuntu ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/ubuntu +sudo chmod 0440 /etc/sudoers.d/ubuntu + +# Set up SSH access (copy your authorized_keys) +sudo mkdir -p /home/ubuntu/.ssh +sudo cp ~/.ssh/authorized_keys /home/ubuntu/.ssh/authorized_keys +sudo chown -R ubuntu:ubuntu /home/ubuntu/.ssh +sudo chmod 600 /home/ubuntu/.ssh/authorized_keys +``` + +From now on, SSH as the ubuntu user: + +```bash +ssh ubuntu@YOUR_SERVER_IP +``` + +## Clone Canonical TDX Repository + +Canonical provides official scripts and tools for TDX setup. + +```bash +cd ~ +git clone -b main https://github.com/canonical/tdx.git +cd tdx +``` + +Verify the repository contents: + +```bash +ls -la +``` + +You should see: + +- `setup-tdx-host.sh` - Main setup script for TDX host +- `setup-tdx-guest.sh` - Script for TDX guest VMs +- `setup-tdx-config` - Configuration file +- `setup-tdx-common` - Common functions +- `attestation/` - Attestation components +- `guest-tools/` - Tools for guest VMs + +## Configure TDX Settings + +Before running the setup, review and configure the settings. + +### View Current Configuration + +```bash +cat setup-tdx-config +``` + +**Key configuration options:** + +| Option | Default | Description | +|--------|---------|-------------| +| `TDX_PPA` | tdx-release | Which PPA to use | +| `TDX_SETUP_ATTESTATION` | 0 | Enable attestation components | +| `TDX_SETUP_NVIDIA_H100` | 0 | NVIDIA H100 GPU support | +| `TDX_SETUP_INTEL_KERNEL` | 0 | Intel-optimized guest kernel | + +### Enable Attestation (Required for dstack) + +**Attestation is required for dstack deployments.** It provides cryptographic proof that your workloads run in genuine Intel TDX Trust Domains. + +Enable attestation: + +```bash +sed -i 's/TDX_SETUP_ATTESTATION=0/TDX_SETUP_ATTESTATION=1/' setup-tdx-config +``` + +Verify the change: + +```bash +grep TDX_SETUP_ATTESTATION setup-tdx-config +``` + +Expected output: `TDX_SETUP_ATTESTATION=1` + +## Run TDX Host Setup Script + +The setup script will: + +1. Add Canonical's TDX PPA (`ppa:kobuk-team/tdx-release`) +2. Install TDX-enabled kernel (`linux-image-intel`) +3. Install TDX-enabled QEMU, libvirt, and OVMF +4. Configure GRUB to boot the TDX kernel +5. Install attestation components (if enabled) +6. Add your user to the `kvm` group + +Run the setup: + +```bash +sudo ./setup-tdx-host.sh +``` + +**Expected output:** + +``` +Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease +... +Adding repository. +... +The following NEW packages will be installed: + linux-image-6.8.0-1028-intel linux-image-intel + qemu-system-x86 libvirt-daemon-system libvirt-clients + ovmf + ... +Need to get 304 MB of archives. +After this operation, 746 MB of additional disk space will be used. +... +``` + +### Installed Packages + +**Core TDX packages:** + +| Package | Version | Description | +|---------|---------|-------------| +| `linux-image-intel` | 6.8.0-1028+ | TDX-enabled kernel | +| `qemu-system-x86` | 8.2.2+tdx | TDX-enabled QEMU | +| `libvirt0` | 10.0.0+tdx | TDX-enabled libvirt | +| `ovmf` | 2024.02+tdx | TDX-enabled UEFI firmware | + +**Attestation packages (if enabled):** + +| Package | Version | Description | +|---------|---------|-------------| +| `tdx-qgs` | 1.21 | Quote Generation Service | +| `sgx-dcap-pccs` | 1.21 | Provisioning Certificate Caching Service | +| `libsgx-dcap-default-qpl` | 1.21 | Quote Provider Library | +| `sgx-ra-service` | 1.21 | Remote Attestation Service | + +### Setup Completion Message + +The script will complete with: + +``` +======================================================================== +The host OS setup has been done successfully. Now, please enable Intel TDX in the BIOS. +======================================================================== +``` + +> **Note:** You've already configured BIOS in the previous tutorial, so you can ignore the BIOS reminder. + +## Verify Kernel Installation + +Before rebooting, verify the TDX kernel was installed: + +```bash +ls -la /boot/vmlinuz* | grep intel +``` + +**Expected output:** + +``` +lrwxrwxrwx 1 root root 24 Dec 7 10:30 /boot/vmlinuz -> vmlinuz-6.8.0-1028-intel +-rw------- 1 root root 15006088 May 23 15:48 /boot/vmlinuz-6.8.0-1028-intel +``` + +Check GRUB is configured to boot the Intel kernel: + +```bash +cat /etc/default/grub.d/99-tdx-kernel.cfg +``` + +Check current kernel (should still be generic): + +```bash +uname -r +``` + +**Example output:** + +``` +6.8.0-88-generic +``` + +After reboot, you'll be running the Intel TDX kernel. + +## Reboot to TDX Kernel + +Reboot the server to load the TDX-enabled kernel: + +```bash +sudo reboot +``` + +**Note:** The server may take 2-3 minutes to reboot. This is normal as TDX initialization takes time during boot. + +## Post-Reboot Verification + +After reboot, SSH back in and verify the TDX kernel is running: + +```bash +ssh ubuntu@YOUR_SERVER_IP +uname -r +``` + +**Expected output:** + +``` +6.8.0-1028-intel +``` + +If you see `6.8.0-1028-intel` (or similar Intel kernel version), the TDX kernel is loaded. + +## Troubleshooting + +For detailed solutions, see the [Host Setup Troubleshooting Guide](/tutorial/troubleshooting-host-setup#tdx-software-installation-issues): + +- [Script fails with permission denied](/tutorial/troubleshooting-host-setup#script-fails-with-permission-denied) +- [PPA fails to add](/tutorial/troubleshooting-host-setup#ppa-fails-to-add) +- [Kernel doesn't change after reboot](/tutorial/troubleshooting-host-setup#kernel-doesnt-change-after-reboot) +- [Attestation services fail to start](/tutorial/troubleshooting-host-setup#attestation-services-fail-to-start) + +## Next Steps + +Continue to [TDX & SGX Verification](/tutorial/tdx-sgx-verification) to verify TDX and SGX are properly enabled. + +## Additional Resources + +- [Canonical TDX Repository](https://github.com/canonical/tdx) +- [Ubuntu TDX Documentation](https://github.com/canonical/tdx/blob/main/README.md) +- [Intel TDX Documentation](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html) diff --git a/docs/tutorials/troubleshooting-dstack-installation.md b/docs/tutorials/troubleshooting-dstack-installation.md new file mode 100644 index 00000000..2b5182ac --- /dev/null +++ b/docs/tutorials/troubleshooting-dstack-installation.md @@ -0,0 +1,373 @@ +--- +title: "Troubleshooting: dstack Installation" +description: "Solutions for common issues during system dependencies, Rust toolchain, VMM build, configuration, service, management interface, and guest image setup" +section: "Troubleshooting" +stepNumber: null +totalSteps: null +isAppendix: true +tags: + - troubleshooting + - dstack + - vmm + - rust + - installation +difficulty: intermediate +estimatedTime: "reference" +lastUpdated: 2026-03-06 +--- + +# Troubleshooting: dstack Installation + +This appendix consolidates troubleshooting content from the dstack Installation tutorials. For inline notes and warnings, see the individual tutorials. + +--- + +## System Baseline Dependencies Issues + +### Package Installation Fails + +```bash +# Fix broken packages +sudo apt --fix-broken install + +# Clear apt cache and retry +sudo apt clean +sudo apt update +sudo apt install -y build-essential +``` + +### OpenMetal Grub Error + +On OpenMetal servers, you may see this error during package installation: + +``` +grub-install: error: diskfilter writes are not supported. +``` + +**This error does not affect dstack installation** - your packages are still installed correctly. To prevent this from blocking future apt operations: + +```bash +sudo apt-mark hold grub-pc grub-efi-amd64-signed +``` + +### Kernel Upgrade Prompts + +If prompted about kernel upgrades during `apt upgrade`: +1. Select "Keep the local version currently installed" if unsure +2. A reboot may be required after kernel updates + +```bash +# Check if reboot is required +cat /var/run/reboot-required 2>/dev/null || echo "No reboot required" +``` + +--- + +## Rust Toolchain Installation Issues + +### rustup command not found + +If `rustup` is not found after installation: + +```bash +# Manually add to PATH +export PATH="$HOME/.cargo/bin:$PATH" + +# Add to shell profile permanently +echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc +source ~/.bashrc +``` + +### Permission denied errors + +```bash +# Ensure cargo directory is owned by your user +sudo chown -R $USER:$USER ~/.cargo ~/.rustup +``` + +### Network timeout during installation + +```bash +# Increase timeout and retry +export CARGO_HTTP_TIMEOUT=300 +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +``` + +### Updating Rust + +To update to the latest stable version: + +```bash +rustup update stable +``` + +--- + +## Clone & Build dstack-vmm Issues + +### Network timeout downloading crates + +```bash +export CARGO_HTTP_TIMEOUT=300 +cargo build --release +``` + +### Linker errors + +Ensure build dependencies are installed: + +```bash +sudo apt install -y build-essential pkg-config libssl-dev +``` + +### Permission denied on install + +```bash +# Ensure you're using sudo +sudo cp ~/dstack/target/release/dstack-vmm /usr/local/bin/ + +# Or install to user directory +mkdir -p ~/.local/bin +cp ~/dstack/target/release/dstack-vmm ~/.local/bin/ +``` + +### Build cache issues + +```bash +cargo clean +cargo update +cargo build --release +``` + +--- + +## VMM Configuration Issues + +### Configuration file not found + +```bash +ls -la /etc/dstack/vmm.toml +``` + +### TOML syntax errors + +```bash +python3 -c "import tomllib; tomllib.load(open('/etc/dstack/vmm.toml', 'rb')); print('TOML syntax OK')" +``` + +If valid, prints "TOML syntax OK". If invalid, shows the error location. + +### Permission denied on socket + +```bash +sudo ls -la /var/run/dstack/ +sudo chmod 755 /var/run/dstack +``` + +### Resource limit errors + +Check current usage and adjust limits: + +```bash +ps aux --sort=-%mem | head +# Then reduce max_allocable_vcpu or max_allocable_memory_in_mb +``` + +--- + +## VMM Service Setup Issues + +### Service fails to start + +```bash +# Check logs for error details +sudo journalctl -u dstack-vmm -n 100 --no-pager + +# Check binary exists +which dstack-vmm +ls -la /usr/local/bin/dstack-vmm + +# Check config exists +ls -la /etc/dstack/vmm.toml +``` + +### Service keeps restarting + +```bash +# Check for crash loops +sudo journalctl -u dstack-vmm --since "10 minutes ago" | grep -i error + +# Check memory +free -h +``` + +### HTTP API not responding + +```bash +# Check VMM is listening on port 9080 +sudo ss -tlnp | grep 9080 + +# Check logs for binding errors +sudo journalctl -u dstack-vmm -n 50 | grep -i "endpoint\|bind\|error" + +# Restart service +sudo systemctl restart dstack-vmm +``` + +### Supervisor socket not created + +```bash +# Check directory exists +ls -la /var/run/dstack/ + +# Create if missing and restart +sudo mkdir -p /var/run/dstack +sudo chmod 755 /var/run/dstack +sudo systemctl restart dstack-vmm +``` + +### Permission denied errors + +```bash +# Ensure directories are writable +sudo chmod 755 /var/run/dstack /var/log/dstack /var/lib/dstack +``` + +--- + +## Management Interface Setup Issues + +### 502 Bad Gateway + +**Symptom:** HAProxy returns 502 error + +**Solution:** +```bash +# Check VMM is running +sudo systemctl status dstack-vmm + +# Check VMM is listening on 9080 +sudo ss -tlnp | grep 9080 + +# Start VMM if needed +sudo systemctl start dstack-vmm +``` + +### Connection Refused + +**Symptom:** Cannot connect to https://vmm.dstack.yourdomain.com + +**Solution:** +```bash +# Check HAProxy is running +sudo systemctl status haproxy + +# Check HAProxy is listening on 443 +sudo ss -tlnp | grep 443 + +# Check firewall allows 443 +sudo ufw status +``` + +### DNS Not Resolving + +**Symptom:** Browser shows DNS error + +**Solution:** +```bash +# Verify DNS resolves (wildcard should cover vmm.dstack.*) +dig +short vmm.dstack.yourdomain.com + +# Should return your server IP +# If not, check your wildcard DNS record in Cloudflare +``` + +### Authentication Failed + +**Symptom:** API returns 401 Unauthorized + +**Solution:** +1. Verify saved token matches vmm.toml: `cat ~/.dstack/secrets/vmm-auth-token` vs `sudo grep tokens /etc/dstack/vmm.toml` +2. Check `Authorization: Bearer TOKEN` header format +3. Re-save if needed: `sudo python3 -c "import tomllib; c=tomllib.load(open('/etc/dstack/vmm.toml','rb')); print(c['auth']['tokens'][0], end='')" > ~/.dstack/secrets/vmm-auth-token` + +### Backend Marked as DOWN + +**Symptom:** HAProxy stats show vmm_backend as DOWN + +**Solution:** +```bash +# Check HAProxy stats +curl -s http://127.0.0.1:8404/stats | grep vmm + +# Verify VMM responds to health check +curl -s http://127.0.0.1:9080/ + +# Check HAProxy logs +sudo journalctl -u haproxy --no-pager -n 20 +``` + +--- + +## Guest Image Setup Issues + +### Images not appearing in VMM + +Check the VMM logs for image loading errors: + +```bash +sudo journalctl -u dstack-vmm -n 100 --no-pager | grep -i image +``` + +Common issues: + +**Image directory not found:** +```bash +# Verify image directory exists and has correct permissions +ls -la /var/lib/dstack/images/ +``` + +**Metadata.json missing or invalid:** +```bash +# Check if metadata exists +cat /var/lib/dstack/images/dstack-*/metadata.json +``` + +**VMM not configured for correct path:** +```bash +# Check VMM configuration +grep image_path /etc/dstack/vmm.toml +``` + +### Image download fails + +Try alternative download methods: + +```bash +# Using curl instead of wget +curl -L -o dstack-${DSTACK_VERSION}.tar.gz \ + https://github.com/Dstack-TEE/meta-dstack/releases/download/v${DSTACK_VERSION}/dstack-${DSTACK_VERSION}.tar.gz +``` + +### Image metadata missing + +If metadata.json is missing, the image may be corrupted: + +```bash +# Re-download and extract +rm -rf /var/lib/dstack/images/dstack-${DSTACK_VERSION} +# Then repeat Steps 2-3 +``` + +### VMM service not running + +```bash +# Check service status +sudo systemctl status dstack-vmm + +# View recent logs +sudo journalctl -u dstack-vmm -n 50 + +# Restart if needed +sudo systemctl restart dstack-vmm +``` diff --git a/docs/tutorials/troubleshooting-first-application.md b/docs/tutorials/troubleshooting-first-application.md new file mode 100644 index 00000000..1a5dd8f6 --- /dev/null +++ b/docs/tutorials/troubleshooting-first-application.md @@ -0,0 +1,144 @@ +--- +title: "Troubleshooting: First Application" +description: "Solutions for common issues during Hello World deployment and attestation verification" +section: "Troubleshooting" +stepNumber: null +totalSteps: null +isAppendix: true +tags: + - troubleshooting + - hello-world + - attestation + - deployment + - cvm +difficulty: intermediate +estimatedTime: "reference" +lastUpdated: 2026-03-06 +--- + +# Troubleshooting: First Application + +This appendix consolidates troubleshooting content from the First Application tutorials. For inline notes and warnings, see the individual tutorials. + +--- + +## Hello World App Issues + +### CVM fails to start + +Check VMM status and logs: + +```bash +systemctl status dstack-vmm +journalctl -u dstack-vmm -n 50 +``` + +Common causes: +- **Insufficient resources:** Reduce `--vcpu` or `--memory` +- **Image not found:** Verify `dstack-0.5.7` exists: `ls /var/lib/dstack/images/` +- **Compose hash not whitelisted:** See Step 5 + +### "OS image is not allowed" + +The OS image hash isn't whitelisted on the KMS contract. See [KMS CVM Deployment: OS image not allowed](/tutorial/troubleshooting-kms-deployment#os-image-is-not-allowed) for the solution. + +### CVM boots but no gateway registration + +Check the CVM logs for gateway-related errors: + +```bash +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=200" | grep -i "gateway\|wireguard\|wg" +``` + +Common causes: +- **`--gateway` flag missing** from `vmm-cli.py compose` — regenerate `app-compose.json` with `--gateway` +- **`--gateway-url` missing** from `vmm-cli.py deploy` — redeploy with the correct URL +- **Gateway RPC unreachable** — verify `curl -sk https://gateway.dstack.yourdomain.com:9202/prpc/Status` works from the host +- **HAProxy missing `gateway_rpc_passthrough` rule** — see [HAProxy Setup](/tutorial/haproxy-setup) + +### Application not accessible via gateway + +1. Check if the app registered: `curl -sf http://127.0.0.1:9203/prpc/Status | jq '.hosts'` +2. Check if Let's Encrypt cert was issued (look for certbot logs in CVM output) +3. Try direct port access first: `curl http://YOUR_SERVER_IP:9300/` +4. Check the [Gateway Deployment Troubleshooting Guide](/tutorial/troubleshooting-gateway-deployment#gateway-cvm-deployment-issues) + +### Cannot pull Docker images + +The CVM needs internet access to pull images from Docker Hub. With user-mode networking (default), this should work automatically. If pulls fail: + +```bash +# Check CVM logs for pull errors +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=200" | grep -i "pull\|image\|error" +``` + +--- + +## Attestation Verification Issues + +### Attestation data retrieval fails + +If `/guest/Info` returns empty or errors, check that the CVM is running: + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +Verify the VM UUID and try the request manually: + +```bash +VM_UUID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json 2>/dev/null \ + | jq -r '.[] | select(.name=="hello-world") | .id') + +curl -s -u "admin:$DSTACK_VMM_AUTH_PASSWORD" \ + -X POST http://127.0.0.1:9080/guest/Info \ + -H "Content-Type: application/json" \ + -d "{\"id\": \"$VM_UUID\"}" | jq 'keys' +``` + +If the response is empty, check that tappd is running inside the CVM: + +```bash +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=$VM_UUID&follow=false&ansi=false&lines=100" | grep -i tappd +``` + +### Measurements don't match + +Common causes: + +**Different VM configuration:** +```bash +# Check actual vCPUs/RAM vs expected (shown in lsvm output) +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +**Different image version:** +```bash +# Verify image version matches (shown in lsvm output) +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +**Image was modified:** +```bash +# Verify image integrity +sha256sum /var/lib/dstack/images/dstack-0.5.7/* +``` + +### RA-TLS certificate issues + +If the `app_cert` field is empty or the certificate doesn't contain RA-TLS extensions: + +```bash +# Check if app_cert is present in the response +curl -s -u "admin:$DSTACK_VMM_AUTH_PASSWORD" \ + -X POST http://127.0.0.1:9080/guest/Info \ + -H "Content-Type: application/json" \ + -d "{\"id\": \"$VM_UUID\"}" | jq '.app_cert | length' +``` + +If the certificate is present but extensions are missing, the CVM may still be initializing. Wait for tappd to complete its boot sequence and try again. diff --git a/docs/tutorials/troubleshooting-gateway-deployment.md b/docs/tutorials/troubleshooting-gateway-deployment.md new file mode 100644 index 00000000..679ecfa5 --- /dev/null +++ b/docs/tutorials/troubleshooting-gateway-deployment.md @@ -0,0 +1,309 @@ +--- +title: "Troubleshooting: Gateway Deployment" +description: "Solutions for common issues during gateway build, configuration, and CVM deployment" +section: "Troubleshooting" +stepNumber: null +totalSteps: null +isAppendix: true +tags: + - troubleshooting + - gateway + - deployment + - cvm + - wireguard + - letsencrypt +difficulty: intermediate +estimatedTime: "reference" +lastUpdated: 2026-03-06 +--- + +# Troubleshooting: Gateway Deployment + +This appendix consolidates troubleshooting content from the Gateway Deployment tutorials. For inline notes and warnings, see the individual tutorials. + +--- + +## Gateway Build & Configuration Issues + +### Contract transaction reverts + +If `deployAndRegisterApp` reverts, check: + +1. **App implementation not set:** The KMS contract owner must call `setAppImplementation` before apps can be deployed + ```bash + cast call "$KMS_CONTRACT_ADDR" \ + "appImplementation()(address)" \ + --rpc-url "$ETH_RPC_URL" + ``` + Should return a non-zero address. + +2. **Insufficient funds:** Your wallet needs Sepolia ETH for gas + ```bash + cast balance $(cast wallet address --private-key $PRIVATE_KEY) --rpc-url "$ETH_RPC_URL" + ``` + +### Compose hash mismatch + +If deployment later fails with "compose hash not allowed": + +1. Regenerate app-compose.json and recalculate the hash +2. Whitelist the new hash on-chain (Step 8) +3. The hash changes whenever docker-compose.yaml or .app_env contents change + +### vmm-cli.py compose errors + +**"Connection refused"** — VMM is not running: +```bash +sudo systemctl restart dstack-vmm +``` + +**"Authentication required"** — Set the auth token: +```bash +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) +``` + +### KMS shows wrong gateway app ID + +**Symptom:** `curl -sk https://localhost:9100/prpc/KMS.GetMeta | jq '.gateway_app_id'` returns the wrong app ID, an empty string, or KMS is unreachable. + +**Cause:** The KMS auth-eth service queries the blockchain directly (via `eth_call`) — it does not cache state. If KMS was deployed with a different `ETH_RPC_URL` or the KMS CVM is having connectivity issues, it may fail to read on-chain changes. Alternatively, you may need to redeploy KMS after port binding changes from the KMS tutorial. + +**Solution:** Redeploy the KMS CVM: + +```bash +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) + +# Get KMS VM ID and remove it +KMS_ID=$(./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json | jq -r '.[] | select(.name=="kms") | .id') +./src/vmm-cli.py --url http://127.0.0.1:9080 stop --force "$KMS_ID" +./src/vmm-cli.py --url http://127.0.0.1:9080 remove "$KMS_ID" + +# Redeploy +./src/vmm-cli.py --url http://127.0.0.1:9080 deploy \ + --name kms \ + --image dstack-0.5.7 \ + --compose ~/kms-deploy/app-compose.json \ + --vcpu 2 \ + --memory 4096 \ + --disk 20 \ + --port tcp:0.0.0.0:9100:9100 +``` + +Wait for KMS to come back up: + +```bash +until curl -sk https://localhost:9100/prpc/KMS.GetMeta > /dev/null 2>&1; do + echo "Waiting for KMS..." + sleep 5 +done +echo "KMS is ready" +``` + +Verify the gateway app ID is now correct: + +```bash +curl -sk https://localhost:9100/prpc/KMS.GetMeta | jq '.gateway_app_id' +``` + +--- + +## Gateway CVM Deployment Issues + +### "Port mapping is not allowed for udp:9202" + +The VMM's port mapping whitelist in `/etc/dstack/vmm.toml` doesn't include UDP ports. The gateway needs UDP for WireGuard. + +**Solution:** Add a UDP range to the port mapping configuration: + +```bash +sudo sed -i '/{ protocol = "tcp", from = 1, to = 20000 },/a\ { protocol = "udp", from = 1, to = 20000 },' /etc/dstack/vmm.toml +sudo systemctl restart dstack-vmm +``` + +See [Gateway CVM Preparation: Step 1](/tutorial/gateway-build-configuration#step-1-verify-prerequisites) for details. + +### "OS image is not allowed" + +**Symptom:** CVM reboots with `Boot denied: OS image is not allowed` in the logs. + +**Cause:** The OS image hash isn't whitelisted on the KMS contract. Each dstack guest image has a unique SHA256 digest (stored in `digest.txt`) that must be explicitly whitelisted. + +**Solution:** + +```bash +# Read the actual OS image digest +OS_IMAGE_HASH=$(cat /var/lib/dstack/images/dstack-0.5.7/digest.txt) +echo "OS image hash: 0x$OS_IMAGE_HASH" + +# Whitelist it on the KMS contract +export KMS_CONTRACT_ADDR=$(cat ~/.dstack/secrets/kms-contract-address) +cast send "$KMS_CONTRACT_ADDR" \ + "addOsImageHash(bytes32)" \ + "0x$OS_IMAGE_HASH" \ + --rpc-url "https://ethereum-sepolia-rpc.publicnode.com" \ + --private-key "$(cat ~/.dstack/secrets/sepolia-private-key)" +``` + +The CVM will retry automatically on its next reboot cycle. + +> **Common mistake:** Do not whitelist `bytes32(0)` (all zeros). The VMM reads the actual digest from the image's `digest.txt` file and passes it to KMS. You must whitelist that specific hash. + +### CVM fails to start + +Check VMM status and logs: + +```bash +systemctl status dstack-vmm +journalctl -u dstack-vmm -n 50 +``` + +Common causes: +- **Insufficient resources:** The gateway requests 32 vCPUs and 32G RAM. Ensure the host has enough free resources. +- **Image not found:** Verify `dstack-0.5.7` exists in VMM images directory. + +### CVM exits immediately or reboots in a loop + +Same root cause as KMS CVM — the `dstack-prepare` service fails to fetch SGX quote collateral from PCCS. + +Check the CVM logs: + +```bash +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=500" | grep -A3 "Failed to get sealing key" +``` + +See [KMS CVM Deployment: CVM Exits Immediately](/tutorial/troubleshooting-kms-deployment#cvm-exits-immediately-or-reboots-in-a-loop) for the full solution. + +### Compose hash not allowed + +**Symptom:** CVM starts but the gateway container fails with an attestation error. + +**Cause:** The `app-compose.json` hash doesn't match what's whitelisted on-chain. + +**Solution:** Recalculate and whitelist the hash: + +```bash +COMPOSE_HASH=$(sha256sum ~/gateway-deploy/app-compose.json | cut -d' ' -f1) +echo "Hash: 0x$COMPOSE_HASH" + +# Check if it's already whitelisted +cast call "$(cat ~/.dstack/secrets/gateway-app-id)" \ + "allowedComposeHashes(bytes32)(bool)" \ + "0x$COMPOSE_HASH" \ + --rpc-url "https://ethereum-sepolia-rpc.publicnode.com" + +# If false, add it +cast send "$(cat ~/.dstack/secrets/gateway-app-id)" \ + "addComposeHash(bytes32)" \ + "0x$COMPOSE_HASH" \ + --rpc-url "https://ethereum-sepolia-rpc.publicnode.com" \ + --private-key "$(cat ~/.dstack/secrets/sepolia-private-key)" +``` + +### Admin API unreachable + +**Symptom:** `curl http://127.0.0.1:9203/prpc/Status` returns "Connection refused" + +1. **CVM not fully booted:** Wait 1-2 minutes and retry. Check logs for progress. +2. **Port mapping wrong:** Verify the deploy command included `--port tcp:127.0.0.1:9203:8001` +3. **Gateway crashed:** Check CVM logs for errors: + ```bash + curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=100" + ``` + +### Let's Encrypt rate limits + +**Symptom:** Certificate requests fail with ACME errors mentioning "too many certificates" or "rate limit". The gateway can't serve browser-trusted TLS traffic. + +**Root cause:** The gateway stores its Let's Encrypt certificates in WaveKV, a persistent key-value store inside the CVM. Here's what triggers — and doesn't trigger — a new certificate request: + +| Action | New cert request? | Why | +|--------|:-:|-----| +| Container restart (within running CVM) | No | Docker named volume preserves WaveKV data | +| CVM destroy + recreate | **Yes** | Docker volume is destroyed, WaveKV store is wiped, no cached cert exists | +| `SetCertbotConfig` with new ACME URL | **Yes** | Renewal loop detects the change and requests from the new CA | +| Normal renewal (cert approaching expiry) | Yes | Expected behavior, well within rate limits | + +Let's Encrypt production allows **10 duplicate certificates per 3 hours per IP**. During iterative testing where you destroy and recreate the CVM multiple times, each redeployment burns one request. Ten redeployments in 3 hours exhausts the limit. + +**How to check if you're rate-limited:** + +```bash +VM_ID=$(cd ~/dstack/vmm && ./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm --json | jq -r '.[] | select(.name=="dstack-gateway") | .id') +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=$VM_ID&follow=false&ansi=false&lines=200" | grep -i "rate\|too many\|acme.*error" +``` + +**Recovery:** + +1. **If already rate-limited on production:** You must wait for the 3-hour window to expire. Switch to staging in the meantime so the gateway can still function (with browser-untrusted staging certs): + ```bash + curl -sf -X POST "http://127.0.0.1:9203/prpc/SetCertbotConfig" \ + -H "Content-Type: application/json" \ + -d '{ + "acme_url": "https://acme-staging-v02.api.letsencrypt.org/directory", + "renew_interval_secs": 3600, + "renew_before_expiration_secs": 864000, + "renew_timeout_secs": 300 + }' && echo "Switched to staging" + ``` + +2. **Avoid the problem entirely:** Follow the staging-first workflow in this tutorial. Use staging during [Step 4a](#4a-set-certbot-configuration), verify everything works, then switch to production once in [Step 6](#step-6-switch-to-production-certificates). This uses exactly one production cert request per stable deployment. + +3. **Minimize redeployments:** If you need to debug the gateway, restart the container inside the CVM rather than destroying and recreating the entire CVM. Container restarts preserve the WaveKV store and don't trigger new cert requests. + +### Certbot fails to issue certificates + +**Symptom:** Applications get TLS errors; certbot debug logs show failures. + +1. **DNS credential not set:** Verify with `curl -sf http://127.0.0.1:9203/prpc/ListDnsCredentials` +2. **Cloudflare token invalid:** Test the token directly: + ```bash + curl -s -H "Authorization: Bearer YOUR_CF_TOKEN" \ + "https://api.cloudflare.com/client/v4/user/tokens/verify" | jq . + ``` +3. **Rate limits:** See [Let's Encrypt rate limits](#lets-encrypt-rate-limits) above. + +### KMS connectivity issues + +**Symptom:** Gateway CVM logs show "Connection refused" errors to KMS, or the CVM reboots in a loop with KMS-related failures. + +**Common causes:** + +1. **Only one `--kms-url` was passed.** The CVM can't reach KMS at `127.0.0.1:9100` — that's the CVM's own localhost. You need a second `--kms-url` with the KMS domain name. + +2. **TLS certificate mismatch.** If you use an IP address (e.g., `10.0.2.2`) instead of the KMS domain name, TLS verification fails because the KMS cert is only valid for the domain set by `KMS_DOMAIN` in the KMS docker-compose. Use `https://kms.yourdomain.com:9100` instead. + +3. **KMS bound to localhost only.** If KMS was deployed with `--port tcp:127.0.0.1:9100:9100`, gateway CVMs cannot reach it. Redeploy KMS with `--port tcp:0.0.0.0:9100:9100` (see [KMS CVM Deployment](/tutorial/kms-cvm-deployment)). + +**Solution:** Redeploy with two `--kms-url` flags using the KMS domain name: + +```bash +--kms-url "https://127.0.0.1:9100" \ +--kms-url "https://kms.yourdomain.com:9100" \ +``` + +The first URL is for host-side encryption by `vmm-cli.py`. The second uses the KMS domain (matching its TLS cert) and is passed into the CVM for runtime KMS access. + +If KMS itself is not running: + +```bash +# Test KMS from host +curl -sk https://localhost:9100/prpc/KMS.GetMeta | jq '{chain_id}' + +# Verify KMS CVM is running +cd ~/dstack/vmm +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +If KMS is not running, redeploy it first. See [KMS CVM Deployment](/tutorial/kms-cvm-deployment). + +### WireGuard endpoint unreachable from app CVMs + +**Symptom:** App CVMs can't establish WireGuard tunnels to the gateway. + +1. **UDP port not forwarded:** Verify `sudo ss -ulnp | grep 9202` shows the port +2. **Firewall blocking UDP:** Check `sudo ufw status` or `sudo iptables -L -n` +3. **PUBLIC_IP wrong:** The WG_ENDPOINT in `.app_env` must be your host's actual public IP diff --git a/docs/tutorials/troubleshooting-host-setup.md b/docs/tutorials/troubleshooting-host-setup.md new file mode 100644 index 00000000..e6434d67 --- /dev/null +++ b/docs/tutorials/troubleshooting-host-setup.md @@ -0,0 +1,279 @@ +--- +title: "Troubleshooting: Host Setup" +description: "Solutions for common issues during TDX BIOS configuration, software installation, and SGX verification" +section: "Troubleshooting" +stepNumber: null +totalSteps: null +isAppendix: true +tags: + - troubleshooting + - tdx + - sgx + - host-setup + - next-steps + - resources +difficulty: intermediate +estimatedTime: "reference" +lastUpdated: 2026-03-07 +--- + +# Troubleshooting: Host Setup + +This appendix consolidates troubleshooting content from the Host Setup tutorials. For inline notes and warnings, see the individual tutorials. + +--- + +## TDX BIOS Configuration Issues + +Before troubleshooting, verify your current TDX status: + +```bash +# Check TDX parameter +cat /sys/module/kvm_intel/parameters/tdx + +# Check TME status +sudo dmesg | grep -i tme + +# Check TDX initialization +sudo dmesg | grep -i tdx +``` + +### TDX Still Shows "N" After BIOS Config + +**Possible causes:** +1. BIOS settings not saved properly +2. TME not enabled +3. Secure Boot interfering (try disabling) +4. SEAM loader not enabled + +**Solution:** +- Re-enter BIOS and verify all settings +- Ensure TME and TME-MT are both enabled +- Check that SEAM Loader is enabled +- Try disabling Secure Boot temporarily + +### "x86/tme: not enabled by BIOS" + +**Cause:** TME not enabled in BIOS + +**Solution:** +- Enter BIOS +- Navigate to CPU Configuration → Memory Encryption +- Enable TME and TME-MT +- Save and reboot + +### TME-MT Option Greyed Out/Disabled + +**Cause:** CPU Physical Address Limit is enabled (restricts to 46-bit addressing) + +**Why this happens:** Intel MKTME (Multi-Key Total Memory Encryption), which TME-MT uses, requires upper address bits for encryption key IDs. The 46-bit physical address limit reserves these bits, preventing TME-MT from functioning. Many server BIOS configurations enable this by default for older OS/hypervisor compatibility. + +**Solution:** +1. Enter BIOS +2. Navigate to: **Advanced → CPU Configuration** (or **Processor Configuration**) +3. Find: **"Limit CPU Physical Address to 46 bits"** or **"Physical Address Limit"** + - May also be labeled: "Hyper-V Physical Address Limit" or "Address Width Limit" +4. **Disable** this setting +5. Save and reboot +6. Re-enter BIOS - TME-MT should now be selectable +7. Enable TME-MT and continue with TDX setup + +**Note:** This is documented in Dell, ASUS, and other server vendor documentation. Enabling the 46-bit limit automatically disables TME-MT capabilities. + +### No SEAM Firmware After Enabling TDX + +**What this actually means:** + +If you see your TDX status checks and do NOT see `virt/tdx: module initialized` in dmesg, or you see TDX-related errors during boot, this indicates the SEAM (Secure Arbitration Mode) firmware module failed to load. + +**Symptoms:** +- `dmesg | grep -i tdx` shows errors or no "module initialized" message +- `cat /sys/module/kvm_intel/parameters/tdx` returns `N` after enabling TDX in BIOS +- TDX-related error messages in dmesg + +**Possible causes:** +1. Server firmware/BIOS needs update +2. Intel TDX SEAM module not installed in firmware +3. BIOS TDX settings not properly saved + +**Solution:** +- Update server BIOS/firmware to latest version +- Check with server vendor for TDX support +- Verify BIOS settings were saved and applied (re-enter BIOS to confirm) +- Some early TDX-capable CPUs may need firmware updates + +### Kernel Panic After Enabling TDX + +**Cause:** Incompatible BIOS settings or outdated firmware + +**Solution:** +- Boot into previous kernel from GRUB menu +- Update server BIOS/firmware +- Check Intel and server vendor documentation for specific TDX requirements + +### TDX Option Not Visible in BIOS + +**Cause:** TME-MT must be enabled first, or CPU doesn't support TDX. + +**Solution:** +1. Ensure TME and TME-MT are both enabled first +2. Verify your CPU supports TDX (check [TDX Hardware Verification](/tutorial/tdx-hardware-verification)) +3. Update BIOS firmware if TDX should be supported + +### SGX Auto MP Registration Not Available + +**Cause:** SGX must be enabled first before the registration option appears. + +**Solution:** +1. Enable "SW Guard Extensions (SGX)" first +2. Save and reboot if necessary +3. Return to BIOS - the Auto MP Registration option should now appear + +### BIOS Settings Don't Persist After Reboot + +**Cause:** BIOS battery issue, settings not saved properly, or BIOS reset. + +**Solution:** +1. Ensure you're pressing F4 or explicitly selecting "Save & Exit" +2. Check for BIOS firmware updates +3. If settings keep resetting, the CMOS battery may need replacement + +--- + +## TDX Software Installation Issues + +### Script fails with permission denied + +```bash +sudo chmod +x setup-tdx-host.sh +sudo ./setup-tdx-host.sh +``` + +### PPA fails to add + +Check internet connectivity: + +```bash +ping -c 3 ppa.launchpad.net +``` + +If behind a proxy, configure apt proxy settings. + +### Kernel doesn't change after reboot + +Verify GRUB configuration: + +```bash +grep -r intel /etc/default/grub.d/ +``` + +Manually select kernel in GRUB menu if needed (hold Shift during boot). + +### Attestation services fail to start + +This is normal before BIOS is configured. Services will start properly after full TDX enablement. + +--- + +## TDX & SGX Verification Issues + +### TDX not enabled (dmesg empty) + +1. Verify BIOS settings are saved (re-enter BIOS and check) +2. Ensure TME-MT is enabled (prerequisite for TDX) +3. Check that TDX SEAM Loader is enabled + +### SGX devices missing + +1. Verify SGX is enabled in BIOS +2. Check that SGX Auto MP Registration is enabled +3. Try a cold boot (full power off, not just reboot) + +### KVM TDX parameter is N + +1. Ensure you're running the Intel kernel (`uname -r` shows `intel`) +2. Check dmesg for TDX initialization errors +3. Verify BIOS TDX settings + +--- + +## Next Steps After TDX Is Enabled + +Now that TDX is enabled on your host, you can: + +### 1. Create TDX Guest VMs + +- Use QEMU/libvirt to launch Trust Domains +- Configure TD guest images with TDX support + +### 2. Test TDX Functionality + +- Run Canonical's test suite: `cd tests && ./test-tdx.sh` +- Verify TD attestation + +### 3. Test TDX Attestation + +- Verify attestation quote generation +- Test remote attestation flow +- Validate DCAP configuration + +### 4. Deploy dstack + +- Install dstack SDK +- Deploy confidential applications to TDX VMs +- Use attestation API for runtime verification + +--- + +## System Requirements Reference + +### Hardware + +- Intel Xeon Scalable (5th Gen Emerald Rapids or 4th Gen Sapphire Rapids with TDX) + - Verify TDX support at https://ark.intel.com +- Memory: At least 2 channels populated per socket (identical DIMMs recommended) +- BIOS with TDX support + +### Software + +- Ubuntu 24.04 LTS (Noble) +- linux-image-intel 6.8.0-1028 or later +- QEMU 8.2.2+tdx1.1 or later +- libvirt 10.0.0+tdx1.2 or later +- OVMF 2024.02+tdx1.0 or later + +### BIOS Settings + +- TME enabled +- TME-MT enabled +- TDX enabled +- SEAM Loader enabled +- SGX enabled (required for KMS attestation) +- SGX Auto MP Registration enabled (required for KMS) +- **Physical Address Limit: DISABLED** (critical for TME-MT) + +--- + +## Additional Resources + +### Official Documentation + +- **Intel ARK (Processor Verification):** https://ark.intel.com +- **Intel TDX Enabling Guide:** https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/ +- **Canonical TDX Documentation:** https://github.com/canonical/tdx +- **Intel TDX Overview:** https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html +- **Ubuntu TDX Wiki:** https://discourse.ubuntu.com/t/intel-tdx-trust-domain-extensions/ + +### dstack Resources + +- **dstack Documentation:** https://docs.phala.com/dstack/overview +- **dstack GitHub:** https://github.com/Phala-Network/dstack + +### Getting Help + +If you encounter issues not covered in this troubleshooting guide: + +1. Check the [Canonical TDX GitHub Issues](https://github.com/canonical/tdx/issues) +2. Review Intel's [TDX Enabling Guide](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/) +3. Consult your server vendor's documentation for TDX-specific guidance +4. Visit the [Ubuntu Discourse TDX Forum](https://discourse.ubuntu.com/t/intel-tdx-trust-domain-extensions/) diff --git a/docs/tutorials/troubleshooting-kms-deployment.md b/docs/tutorials/troubleshooting-kms-deployment.md new file mode 100644 index 00000000..74a39631 --- /dev/null +++ b/docs/tutorials/troubleshooting-kms-deployment.md @@ -0,0 +1,388 @@ +--- +title: "Troubleshooting: KMS Deployment" +description: "Solutions for common issues during contract deployment, KMS build, and KMS CVM deployment" +section: "Troubleshooting" +stepNumber: null +totalSteps: null +isAppendix: true +tags: + - troubleshooting + - kms + - contracts + - deployment + - cvm +difficulty: intermediate +estimatedTime: "reference" +lastUpdated: 2026-03-06 +--- + +# Troubleshooting: KMS Deployment + +This appendix consolidates troubleshooting content from the KMS Deployment tutorials. For inline notes and warnings, see the individual tutorials. + +--- + +## Contract Deployment Issues + +### Artifact not found + +``` +Error HH700: Artifact for contract "DstackApp" not found. +``` + +Contracts must be compiled before deployment. Run: + +```bash +npx hardhat compile +``` + +### Insufficient funds + +``` +Error: insufficient funds for gas +``` + +Get Sepolia ETH from faucets listed above. + +### Transaction underpriced + +``` +Error: replacement transaction underpriced +``` + +Wait for pending transactions to complete, then retry. + +### Nonce too low + +``` +Error: nonce too low +``` + +A transaction with this nonce already exists. Wait for confirmation. + +### Connection failed + +``` +Error: could not detect network +``` + +Check your RPC endpoint is reachable: + +```bash +curl -s -X POST "https://ethereum-sepolia-rpc.publicnode.com" \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' +``` + +Should return a block number, not an error. + +--- + +## KMS Build & Configuration Issues + +### Build fails with missing dependencies + +``` +Error: linker `cc` not found +``` + +Install build dependencies: + +```bash +sudo apt install -y build-essential pkg-config libssl-dev +``` + +### Configuration file not found + +``` +Error: Could not find configuration file +``` + +Verify the file exists and has correct permissions: + +```bash +ls -la /etc/kms/kms.toml +``` + +### Auth-eth npm install fails + +``` +Error: EACCES permission denied +``` + +Fix npm permissions: + +```bash +mkdir -p ~/.npm-global +npm config set prefix '~/.npm-global' +export PATH=~/.npm-global/bin:$PATH +npm install +``` + +### Invalid TOML syntax + +``` +Error: invalid TOML +``` + +Validate your configuration: + +```bash +cat /etc/kms/kms.toml | python3 -c "import sys, tomllib; tomllib.load(sys.stdin.buffer)" +``` + +### RPC connection failed + +``` +Error: could not connect to RPC +``` + +Check network connectivity: + +```bash +curl -s -X POST "https://ethereum-sepolia-rpc.publicnode.com" \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' +``` + +### Contract address not set + +``` +Error: KMS_CONTRACT_ADDR not set +``` + +Ensure you've completed [Contract Deployment](/tutorial/contract-deployment) and the contract address is saved: + +```bash +cat ~/.dstack/secrets/kms-contract-address +``` + +--- + +## KMS CVM Deployment Issues + +### CVM fails to start + +``` +Error: Failed to create CVM +``` + +Check VMM status and logs: + +```bash +systemctl status dstack-vmm +journalctl -u dstack-vmm -n 50 +``` + +Ensure VMM has TDX enabled and sufficient resources. + +### CVM Exits Immediately or Reboots in a Loop + +**Symptom:** The CVM shows status `exited` after only 15-20 seconds, or keeps restarting if `auto_restart` is enabled. + +**Root Cause:** The `dstack-prepare` service fails to fetch SGX quote collateral from PCCS during early boot, which prevents sealing key generation. The service has `FailureAction=reboot`, so it reboots the CVM on failure. + +Check the CVM logs (replace `VM_ID` with actual ID from `lsvm`): + +```bash +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=500" | grep -A3 "Failed to get sealing key" +``` + +If you see `Failed to get sealing key` → `Failed to get quote collateral` → `Network is unreachable` or `Connection refused`, the CVM cannot reach PCCS. + +**Solution:** Verify these settings: + +1. **VMM networking mode must be `user`** — see [VMM Configuration: Networking Modes](/tutorial/vmm-configuration#networking-modes) for why +2. **`pccs_url` must be set** in `/etc/dstack/vmm.toml`: + ```toml + pccs_url = "https://pccs.phala.network/sgx/certification/v4" + ``` +3. **The CVM must have internet access** to reach `pccs.phala.network` — user-mode networking provides this automatically. + +After fixing, restart VMM (`sudo systemctl restart dstack-vmm`) and redeploy. + +### Bootstrap hangs + +``` +Waiting for bootstrap to complete... +``` + +Check if guest-agent is running inside the CVM. Use the VMM web console to view the instance details, or check the logs (replace `VM_ID` with actual ID from `lsvm`): + +```bash +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=100" +``` + +The `/var/run/dstack.sock` socket must exist inside the CVM for TDX quote generation. + +### Port 9100 not accessible + +``` +Connection refused +``` + +Check CVM network configuration: + +```bash +# Verify port mapping in docker-compose.yml +cat ~/kms-deployment/docker-compose.yml | grep ports -A2 + +# Check CVM status via vmm-cli.py +cd ~/dstack/vmm +export DSTACK_VMM_AUTH_PASSWORD=$(cat ~/.dstack/secrets/vmm-auth-token) +./src/vmm-cli.py --url http://127.0.0.1:9080 lsvm +``` + +### TDX quote not generated + +``` +"quote": null +``` + +This indicates quote_enabled might be false, guest-agent issues, or **SGX not properly configured**: + +```bash +# Check CVM logs for TDX-related errors (replace VM_ID with actual ID from lsvm) +curl -s -H "Authorization: Bearer $(cat ~/.dstack/secrets/vmm-auth-token)" \ + "http://127.0.0.1:9080/logs?id=VM_ID&follow=false&ansi=false&lines=100" | grep -i "quote\|tdx\|sgx" +``` + +**Common causes:** + +1. **SGX not enabled in BIOS** - Verify SGX devices exist on host: + ```bash + ls -la /dev/sgx* + ``` + If missing, configure SGX in BIOS. See [TDX & SGX BIOS Configuration](/tutorial/tdx-bios-configuration). + +2. **SGX Auto MP Registration not enabled** - Without this BIOS setting, your platform isn't registered with Intel's PCS, and attestation quotes cannot be verified. Re-enter BIOS and enable "SGX Auto MP Registration". + +3. **quote_enabled is false** - Verify your `kms.toml` has `quote_enabled = true` in the `[core.onboard]` section. + +4. **Guest-agent not running** - The `/var/run/dstack.sock` socket must exist inside the CVM. + +### CVM Fails with "QGS error code: 0x12001" + +**Symptom:** CVM exits after ~13 seconds with: +``` +Error: Failed to request app keys + 0: Failed to get sealing key + 1: Failed to get quote + 2: quote failure: QGS error code: 0x12001 +``` + +**Root Cause:** The host's Quote Generation Service (QGS) cannot fetch PCK certificates from PCCS. This is a **host-side** issue, not a CVM issue. Check QGS logs: + +```bash +sudo journalctl -u qgsd -n 20 +``` + +If you see `[QPL] No certificate data for this platform` or `Intel PCS server returns error(401)`, the host QCNL is misconfigured. + +**Solution:** Update `/etc/sgx_default_qcnl.conf` to use a working PCCS: + +```bash +# Check current config +grep pccs_url /etc/sgx_default_qcnl.conf + +# Update to Phala's public PCCS +sudo tee /etc/sgx_default_qcnl.conf > /dev/null << 'EOF' +{ + "pccs_url": "https://pccs.phala.network/sgx/certification/v4/", + "use_secure_cert": false, + "retry_times": 6, + "retry_delay": 10 +} +EOF + +sudo systemctl restart qgsd +``` + +> **Note:** The host QCNL controls TDX quote **generation**. The CVM's `pccs_url` in vmm.toml controls quote **verification**. Both must point to a working PCCS. See [VMM Configuration: Configure Host QCNL](/tutorial/vmm-configuration#step-6-configure-host-qcnl-for-quote-generation). + +### GetMeta Returns "Connection refused" on Port 9200 + +**Symptom:** KMS responds to `GetTempCaCert` but GetMeta returns: +```json +{"error": "error sending request for url (http://127.0.0.1:9200/): ...Connection refused (os error 111)"} +``` + +**Root Cause:** auth-eth defaults to port 8000, but kms.toml expects the webhook at port 9200. + +**Solution:** Ensure your docker-compose.yaml includes `PORT=9200` in the environment section: + +```yaml +environment: + - PORT=9200 # Must match kms.toml webhook URL port +``` + +Then regenerate the app-compose.json and redeploy: + +```bash +./src/vmm-cli.py --url http://127.0.0.1:9080 compose \ + --name kms \ + --docker-compose ~/kms-deploy/docker-compose.yaml \ + --local-key-provider \ + --output ~/kms-deploy/app-compose.json +``` + +### GetMeta Returns "missing field `status`" + +**Symptom:** KMS responds but GetMeta returns: +```json +{"error": "error decoding response body: missing field `status` at line 1 column ..."} +``` + +**Root Cause:** auth-eth is running and reachable (port 9200 is correct), but it cannot connect to Ethereum RPC. Without `ETH_RPC_URL` and `KMS_CONTRACT_ADDR`, auth-eth defaults to `http://localhost:8545` (nothing there) and returns a Fastify error instead of the expected `{status: 'ok', ...}` response. + +**Solution:** Ensure your docker-compose.yaml includes both Ethereum configuration variables: + +```yaml +environment: + - ETH_RPC_URL=https://ethereum-sepolia-rpc.publicnode.com + - KMS_CONTRACT_ADDR=YOUR_CONTRACT_ADDRESS +``` + +Get your contract address from `~/.dstack/secrets/kms-contract-address`. See Step 3 for the complete docker-compose.yaml template. + +### GetMeta Hangs or Times Out + +**Symptom:** `curl` to GetMeta hangs indefinitely or times out after 30+ seconds + +**Root Cause:** The auth-eth service is using an unreachable or rate-limited Ethereum RPC endpoint. + +**Solution:** Verify your `ETH_RPC_URL` environment variable points to a working Sepolia RPC: + +```bash +# Check what ETH_RPC_URL is set in your deployment +grep ETH_RPC_URL ~/kms-deploy/docker-compose.yaml + +# Test the endpoint directly +curl -s -X POST YOUR_ETH_RPC_URL \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' +``` + +Verify the URL matches `https://ethereum-sepolia-rpc.publicnode.com` (or your preferred Sepolia RPC provider). + +### CVM Hangs at "Waiting for time to be synchronized" + +**Symptom:** CVM boot log shows "Waiting for the system time to be synchronized" and never proceeds + +**Root Cause:** The `--secure-time` flag was used during deployment + +**Solution:** Redeploy without the `--secure-time` flag: + +```bash +./src/vmm-cli.py --url http://127.0.0.1:9080 deploy \ + --name kms \ + --image dstack-0.5.7 \ + --compose ~/kms-deploy/app-compose.json \ + --vcpu 2 \ + --memory 4096 \ + --disk 20 \ + --port tcp:127.0.0.1:9100:9100 + # Note: NO --secure-time flag +``` diff --git a/docs/tutorials/troubleshooting-prerequisites.md b/docs/tutorials/troubleshooting-prerequisites.md new file mode 100644 index 00000000..16e75294 --- /dev/null +++ b/docs/tutorials/troubleshooting-prerequisites.md @@ -0,0 +1,475 @@ +--- +title: "Troubleshooting: Prerequisites" +description: "Solutions for common issues during DNS, SSL, Docker, HAProxy, key provider, blockchain, and registry setup" +section: "Troubleshooting" +stepNumber: null +totalSteps: null +isAppendix: true +tags: + - troubleshooting + - dns + - ssl + - docker + - haproxy + - registry + - blockchain + - prerequisites +difficulty: intermediate +estimatedTime: "reference" +lastUpdated: 2026-03-06 +--- + +# Troubleshooting: Prerequisites + +This appendix consolidates troubleshooting content from the Prerequisites tutorials. For inline notes and warnings, see the individual tutorials. + +--- + +## DNS Configuration Issues + +### DNS Not Resolving + +**Issue:** `dig` returns `NXDOMAIN` or no answer. + +**Solutions:** +1. Wait for DNS propagation (can take up to 48 hours) +2. Check nameservers are set correctly at registrar +3. Verify Cloudflare shows domain as "Active" +4. Ensure DNS records saved correctly in Cloudflare dashboard + +### Wildcard Not Working + +**Issue:** Base domain resolves, but `*.dstack.yourdomain.com` doesn't. + +**Solutions:** +1. Verify wildcard record uses `*.dstack` not `*` +2. Check wildcard record has same IP as base record +3. Confirm proxy status is "DNS only" (gray cloud) +4. Wait for DNS cache to expire (TTL) + +### API Token Permission Denied + +**Issue:** `curl` test returns `"success": false` or permission errors. + +**Solutions:** +1. Verify token has "Zone → DNS → Edit" permission +2. Ensure token is scoped to correct zone (your domain) +3. Check token hasn't expired (if TTL was set) +4. Regenerate token if compromised + +### Propagation Taking Too Long + +**Issue:** DNS changes not visible after several hours. + +**Solutions:** +1. Check nameservers at registrar match Cloudflare's +2. Use `dig @1.1.1.1` to query Cloudflare DNS directly (bypasses local cache) +3. Clear local DNS cache: `sudo systemd-resolve --flush-caches` (Linux) or `sudo dscacheutil -flushcache` (macOS) +4. Test from external DNS checker: https://www.whatsmydns.net/ + +--- + +## SSL Certificate Setup Issues + +### Challenge Failed: Could not connect + +**Symptom:** Certbot fails with connection error + +**Solution:** +1. Verify port 80 is open: `sudo ss -tlnp | grep :80` +2. Stop any services using port 80 +3. Check firewall allows port 80: `sudo ufw status` +4. Verify DNS resolves to your server: `dig +short registry.yourdomain.com` + +### Rate Limit Exceeded + +**Symptom:** Let's Encrypt returns rate limit error + +**Solution:** +1. Wait 1 hour and retry +2. Check https://letsencrypt.org/docs/rate-limits/ + +### DNS Resolution Failed + +**Symptom:** Certbot can't verify domain ownership + +**Solution:** +1. Check DNS record exists: `dig +short registry.yourdomain.com` +2. Wait for DNS propagation (up to 48 hours for new records) +3. Verify record points to correct IP + +### HAProxy Can't Read Certificates + +**Symptom:** HAProxy fails to start with certificate permission error + +**Solution:** +```bash +# Check certificate permissions +sudo ls -la /etc/letsencrypt/live/registry.yourdomain.com/ + +# Certificates are symlinks - check actual files +sudo ls -la /etc/letsencrypt/archive/registry.yourdomain.com/ + +# Check HAProxy combined PEM files +sudo ls -la /etc/haproxy/certs/ + +# Regenerate combined PEM if needed +sudo cat /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/registry.yourdomain.com/privkey.pem \ + | sudo tee /etc/haproxy/certs/registry.pem > /dev/null +sudo chmod 600 /etc/haproxy/certs/registry.pem + +# If issues persist, check HAProxy logs +sudo journalctl -u haproxy --no-pager -n 20 +``` + +--- + +## Docker Setup Issues + +### Permission Denied + +**Symptom:** `Got permission denied while trying to connect to the Docker daemon socket` + +**Solution:** +1. Ensure user is in docker group: `groups` +2. If not listed, add user: `sudo usermod -aG docker $USER` +3. Log out and back in, or run: `newgrp docker` + +### Docker Service Not Starting + +**Symptom:** `systemctl status docker` shows failed + +**Solution:** +```bash +# Check logs +sudo journalctl -u docker -n 50 + +# Common fix: restart containerd first +sudo systemctl restart containerd +sudo systemctl restart docker +``` + +### Repository Not Found + +**Symptom:** `apt update` fails with Docker repository error + +**Solution:** +```bash +# Verify the repository file +cat /etc/apt/sources.list.d/docker.list + +# Should contain a valid URL for your Ubuntu version +# If incorrect, recreate with Step 4 above +``` + +--- + +## HAProxy Setup Issues + +### Port 443 Already in Use + +**Symptom:** HAProxy fails to start with "Address already in use" + +**Solution:** +```bash +# Find what's using port 443 +sudo ss -tlnp | grep :443 + +# Common culprits: nginx, apache, docker +sudo systemctl stop nginx 2>/dev/null +sudo systemctl stop apache2 2>/dev/null + +# Check for Docker containers on 443 +docker ps --format '{{.Names}} {{.Ports}}' | grep 443 +``` + +### Configuration Test Fails + +**Symptom:** `haproxy -c` shows errors + +**Solution:** +```bash +# Check the specific error message +sudo haproxy -c -f /etc/haproxy/haproxy.cfg + +# Common issues: +# - Certificate file not found: check /etc/haproxy/certs/ +# - Invalid ACL syntax: check domain patterns +# - Backend server unreachable: check service is running +``` + +### Certificate Errors + +**Symptom:** TLS handshake failures + +**Solution:** +```bash +# Check certificate files exist +ls -la /etc/haproxy/certs/ + +# Verify certificate format (should have both cert and key) +openssl x509 -in /etc/haproxy/certs/registry.pem -noout -subject +openssl rsa -in /etc/haproxy/certs/registry.pem -check -noout + +# Regenerate combined PEM if needed +sudo cat /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem \ + /etc/letsencrypt/live/registry.yourdomain.com/privkey.pem \ + | sudo tee /etc/haproxy/certs/registry.pem > /dev/null +``` + +### Backend Health Check Failing + +**Symptom:** Backend marked as DOWN in stats + +**Solution:** +```bash +# Check if backend service is running +sudo ss -tlnp | grep 5000 # Registry +sudo ss -tlnp | grep 9080 # VMM +sudo ss -tlnp | grep 9204 # Gateway proxy +sudo ss -tlnp | grep 9202 # Gateway RPC + +# Test backend directly +curl -s http://127.0.0.1:5000/v2/ # Registry +curl -s http://127.0.0.1:9080/ # VMM +``` + +### Gateway Not Receiving Traffic + +**Symptom:** Requests to *.dstack.* domains fail + +**Solution:** +```bash +# Check gateway proxy is listening on 9204 +sudo ss -tlnp | grep 9204 + +# Check gateway RPC is listening on 9202 +sudo ss -tlnp | grep 9202 + +# Check HAProxy routing (enable debug) +sudo haproxy -d -f /etc/haproxy/haproxy.cfg + +# Verify SNI pattern in config matches your domain +grep "dstack" /etc/haproxy/haproxy.cfg +``` + +--- + +## Gramine Key Provider Issues + +### Container fails to start: SGX devices not found + +**Symptom:** Container exits immediately with device error + +**Solution:** +1. Verify SGX devices exist: `ls -la /dev/sgx*` +2. If missing, check BIOS SGX settings +3. Ensure SGX kernel module is loaded: `lsmod | grep sgx` + +### Error: AESM service not ready + +**Symptom:** Key provider fails with AESM connection error + +**Solution:** +```bash +# Restart aesmd first +docker restart aesmd +sleep 5 +docker restart gramine-sealing-key-provider + +# Check aesmd logs +docker logs aesmd 2>&1 | tail -30 +``` + +### Quote verification failures + +**Symptom:** Logs show "quote verification failed" + +**Solution:** +1. Verify QCNL configuration points to `https://pccs.phala.network/sgx/certification/v4/` +2. Check network connectivity: `curl -sk https://pccs.phala.network/sgx/certification/v4/rootcacrl` +3. Verify the QCNL config file exists at `~/dstack/key-provider-build/sgx_default_qcnl.conf` + +### Empty response from curl test + +**Symptom:** `curl -sk https://127.0.0.1:3443/` returns nothing + +**Explanation:** This is normal. The key provider doesn't serve a root endpoint - it only responds to specific API calls from CVMs. An empty response means the TLS handshake succeeded, confirming the service is running. + +If you get `curl: (7) Failed to connect`, the service is not running - check container logs with `docker logs gramine-sealing-key-provider`. + +### Port 3443 already in use + +**Symptom:** Container fails to bind to port + +**Solution:** +```bash +# Find what's using the port +sudo ss -tlnp | grep 3443 + +# Kill the process or change port in docker-compose.yml +``` + +### SGX enclave initialization timeout + +**Symptom:** Container starts but enclave never initializes + +**Solution:** +1. Check SGX is enabled in BIOS +2. Verify SGX Auto MP Registration is enabled +3. Check PCCS is reachable: `curl -sk https://pccs.phala.network/sgx/certification/v4/rootcacrl` + +--- + +## Blockchain Wallet Setup Issues + +### Problem: Faucet not sending ETH + +**Solutions:** + +- Try different faucet from the list above +- Check wallet address is correct +- Wait 5-10 minutes (sometimes delayed) +- Check block explorer: + ```bash + open "https://sepolia.etherscan.io/address/$(cat ~/.dstack/secrets/sepolia-address)" + ``` + +### Problem: RPC endpoint timing out + +**Solutions:** + +- Check internet connection +- Verify RPC URL is correct (should be `https://ethereum-sepolia-rpc.publicnode.com`) +- Try a different public RPC endpoint from [chainlist.org](https://chainlist.org/chain/11155111) + +### Problem: "Connection refused" error + +**Solutions:** + +- Ensure using `https://` not `http://` +- Try alternative RPC endpoint +- Check firewall not blocking outbound connections + +### Problem: Can't see balance in cast + +**Solutions:** + +- Wait for testnet ETH to arrive (check block explorer) +- Verify RPC URL is correct +- Try different RPC endpoint +- Ensure wallet address is correct + +--- + +## Local Docker Registry Issues + +### Certificate Verification Failed + +**Symptom:** `curl` returns SSL certificate error + +**Solution:** +```bash +# Check certificate dates +openssl x509 -in /etc/letsencrypt/live/registry.yourdomain.com/fullchain.pem -dates -noout + +# If expired, renew +sudo certbot renew --force-renewal + +# Reload HAProxy to pick up new certs +sudo systemctl reload haproxy +``` + +### 503 Service Unavailable from HAProxy + +**Symptom:** `curl https://registry.yourdomain.com/v2/` returns `503 Service Unavailable`, but `curl http://127.0.0.1:5000/v2/` works fine locally. + +**Root Cause:** HAProxy's health check has marked the registry backend as DOWN. This typically happens when HAProxy started before the registry container was running. + +**Solution:** +```bash +# Check backend health status in HAProxy stats +curl -s http://127.0.0.1:8404/stats | grep registry + +# Or check via the stats page (via SSH tunnel) +ssh -L 8404:127.0.0.1:8404 user@your-server +# Then open http://localhost:8404/stats in browser +``` + +The fix is simply to reload HAProxy so it re-checks the backend: + +```bash +sudo systemctl reload haproxy +``` + +After reloading, HAProxy will re-run its health check (`GET /v2/`), see the registry is healthy, and start routing traffic again. Verify: + +```bash +curl -s https://registry.yourdomain.com/v2/ +``` + +> **Tip:** If you start services in the order: HAProxy first, then registry, HAProxy will mark the registry backend as DOWN until the next health check interval. Reloading HAProxy forces an immediate re-check. + +### 502 Bad Gateway from HAProxy + +**Symptom:** External requests return 502 error + +**Solution:** +```bash +# Check registry container is running +docker ps | grep registry + +# Check registry is listening on localhost:5000 +curl -s http://127.0.0.1:5000/v2/ + +# If not running, start it +docker start registry + +# Check container logs for errors +docker logs registry +``` + +### DNS Not Resolving (Docker Registry) + +**Symptom:** `curl` to registry fails with "Could not resolve host" + +**Solution:** +1. Verify DNS record exists: `dig +short registry.yourdomain.com` +2. Wait for DNS propagation (up to 48 hours for new records) +3. Check Cloudflare/DNS provider dashboard + +### Registry Container Not Starting + +**Symptom:** Container won't start or immediately exits + +**Solution:** +```bash +# Check container logs +docker logs registry + +# Remove and recreate if needed +docker rm -f registry +docker run -d \ + --name registry \ + --restart always \ + -p 127.0.0.1:5000:5000 \ + -v /var/lib/registry:/var/lib/registry \ + registry:2 +``` + +### HAProxy Configuration Error + +**Symptom:** HAProxy won't start or reload + +**Solution:** +```bash +# Test configuration +sudo haproxy -c -f /etc/haproxy/haproxy.cfg + +# Check HAProxy logs +sudo journalctl -u haproxy --no-pager -n 20 + +# Verify certificates exist +ls -la /etc/haproxy/certs/ +``` diff --git a/docs/tutorials/vmm-configuration.md b/docs/tutorials/vmm-configuration.md new file mode 100644 index 00000000..4faf40cb --- /dev/null +++ b/docs/tutorials/vmm-configuration.md @@ -0,0 +1,320 @@ +--- +title: "VMM Configuration" +description: "Configure the dstack Virtual Machine Monitor for your environment" +section: "dstack Installation" +stepNumber: 4 +totalSteps: 8 +lastUpdated: 2025-12-07 +prerequisites: + - clone-build-dstack-vmm + - dns-configuration +tags: + - dstack + - vmm + - configuration + - toml +difficulty: "intermediate" +estimatedTime: "15 minutes" +--- + +# VMM Configuration + +This tutorial guides you through configuring the dstack Virtual Machine Monitor (VMM) for **production use**. The VMM uses a TOML configuration file to define server settings, VM resource limits, networking, authentication, and service endpoints. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [Clone & Build dstack-vmm](/tutorial/clone-build-dstack-vmm) +- SSH access to your TDX-enabled server +- Root or sudo privileges +- Your gateway domain configured (e.g., `dstack.yourdomain.com`) + + +## Configuration + +### Step 1: Connect to Your Server + +```bash +ssh ubuntu@YOUR_SERVER_IP +``` + +### Step 2: Check Server Resources + +```bash +# Check CPU cores +nproc + +# Check total memory in MB +free -m | awk '/^Mem:/{print $2}' +``` + +Calculate your resource limits: +- **Max vCPUs**: Total cores - 4 (reserve for host) +- **Max Memory**: Total MB - 16384 (reserve 16GB for host) +- **Workers**: Total cores / 8 (minimum 4, maximum 32) + +For example, on a 128-core, 1TB RAM server: +- Max vCPUs: 128 - 4 = **124** +- Max Memory: 1,007,000 - 16,384 = **990,616 MB** +- Workers: 128 / 8 = **16** + +### Step 3: Generate an Auth Token + +```bash +# Generate a secure random token and save it +AUTH_TOKEN=$(openssl rand -hex 32) +mkdir -p ~/.dstack/secrets +echo -n "$AUTH_TOKEN" > ~/.dstack/secrets/vmm-auth-token +chmod 600 ~/.dstack/secrets/vmm-auth-token +echo "Auth token saved to ~/.dstack/secrets/vmm-auth-token" +``` + +### Step 4: Create Configuration Directory + +```bash +sudo mkdir -p /etc/dstack +``` + +### Step 5: Create VMM Configuration File + +Replace the placeholder values with your actual settings: + +```bash +AUTH_TOKEN=$(cat ~/.dstack/secrets/vmm-auth-token) +sudo tee /etc/dstack/vmm.toml > /dev/null < **Important:** There are two independent PCCS configurations: +> +> | Config | File | Used By | Purpose | +> |--------|------|---------|---------| +> | Host QCNL | `/etc/sgx_default_qcnl.conf` | QGS | PCK certs for quote **generation** | +> | CVM pccs_url | `/etc/dstack/vmm.toml` | dstack-util inside CVM | Collateral for quote **verification** | +> +> Both must point to a working PCCS. If the host QCNL is misconfigured, CVMs will fail during boot with `QGS error code: 0x12001`. + +Update the host QCNL to use Phala Network's public PCCS: + +```bash +sudo tee /etc/sgx_default_qcnl.conf > /dev/null << 'EOF' +{ + "pccs_url": "https://pccs.phala.network/sgx/certification/v4/", + "use_secure_cert": false, + "retry_times": 6, + "retry_delay": 10, + "pck_cache_expire_hours": 168, + "verify_collateral_cache_expire_hours": 168, + "local_cache_only": false +} +EOF +``` + +Restart QGS to pick up the new configuration: + +```bash +sudo systemctl restart qgsd +``` + +Verify QGS is running: + +```bash +systemctl status qgsd +``` + +### Step 7: Create Runtime Directories + +```bash +sudo mkdir -p /var/run/dstack +sudo mkdir -p /var/log/dstack +sudo mkdir -p /var/lib/dstack +sudo chmod 755 /var/run/dstack /var/log/dstack /var/lib/dstack +``` + +### Step 8: Verify Configuration + +```bash +# Check config file exists +cat /etc/dstack/vmm.toml + +# Verify TOML syntax (no output = valid, error message = invalid) +python3 -c "import tomllib; tomllib.load(open('/etc/dstack/vmm.toml', 'rb')); print('TOML syntax OK')" +``` + +--- + +## Configuration Reference + +### Networking Modes + +| Mode | Performance | Isolation | Setup | Recommended For | +|------|-------------|-----------|-------|-----------------| +| `user` | Good | Good | None | **Recommended** — reliable internet access from CVM boot | + +| `host` | Best | None | None | Special cases only | + +**User Mode (Recommended):** + +QEMU user-mode networking creates a virtual NAT network inside the QEMU process. Internet connectivity is available **immediately** when the CVM boots — before external network routes are established. This is critical because the CVM's `dstack-prepare` service needs to reach the public PCCS (`pccs.phala.network`) during early boot to fetch SGX quote collateral for sealing key verification. + +```toml +[cvm.networking] +mode = "user" +net = "10.0.2.0/24" +dhcp_start = "10.0.2.10" +restrict = false +``` + +With user-mode networking, CVMs have internet access through QEMU's built-in NAT. The PCCS at `https://pccs.phala.network` is reachable immediately, and host services are accessible at `10.0.2.2`. + +### Authentication + +For production, always enable authentication: + +```toml +[auth] +enabled = true +tokens = ["your-secure-token-here"] +``` + +You can specify multiple tokens for different clients: + +```toml +[auth] +enabled = true +tokens = [ + "token-for-admin", + "token-for-ci-cd", + "token-for-monitoring" +] +``` + +### GPU Passthrough + +To enable GPU passthrough for AI/ML workloads: + +```toml +[cvm.gpu] +enabled = true +listing = ["10de:2335"] # NVIDIA GPU product IDs +allow_attach_all = true +``` + +**Requirements:** +- IOMMU enabled in BIOS +- VFIO driver configured +- GPU not in use by host + +--- + +## Troubleshooting + +For detailed solutions, see the [dstack Installation Troubleshooting Guide](/tutorial/troubleshooting-dstack-installation#vmm-configuration-issues): + +- [Configuration file not found](/tutorial/troubleshooting-dstack-installation#configuration-file-not-found) +- [TOML syntax errors](/tutorial/troubleshooting-dstack-installation#toml-syntax-errors) +- [Permission denied on socket](/tutorial/troubleshooting-dstack-installation#permission-denied-on-socket) +- [Resource limit errors](/tutorial/troubleshooting-dstack-installation#resource-limit-errors) + +--- + +## Next Steps + +With VMM configured, proceed to set up the systemd service: + +- [VMM Service Setup](/tutorial/vmm-service-setup) - Create and start the VMM service + +## Additional Resources + +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) +- [TOML Specification](https://toml.io/en/) diff --git a/docs/tutorials/vmm-service-setup.md b/docs/tutorials/vmm-service-setup.md new file mode 100644 index 00000000..eb21cd1f --- /dev/null +++ b/docs/tutorials/vmm-service-setup.md @@ -0,0 +1,188 @@ +--- +title: "VMM Service Setup" +description: "Configure dstack VMM to run as a systemd service with automatic startup" +section: "dstack Installation" +stepNumber: 5 +totalSteps: 8 +lastUpdated: 2025-12-07 +prerequisites: + - vmm-configuration +tags: + - dstack + - vmm + - systemd + - service +difficulty: "intermediate" +estimatedTime: "10 minutes" +--- + +# VMM Service Setup + +This tutorial guides you through setting up the dstack Virtual Machine Monitor (VMM) as a systemd service. Running VMM as a service ensures it starts automatically on boot and restarts if it crashes. + +## Prerequisites + +Before starting, ensure you have: + +- Completed [VMM Configuration](/tutorial/vmm-configuration) +- SSH access to your TDX-enabled server +- Root or sudo privileges + + +## Service Management Commands + +| Command | Description | +|---------|-------------| +| `sudo systemctl start dstack-vmm` | Start the service | +| `sudo systemctl stop dstack-vmm` | Stop the service | +| `sudo systemctl restart dstack-vmm` | Restart the service | +| `sudo systemctl status dstack-vmm` | Check service status | +| `sudo systemctl enable dstack-vmm` | Enable start on boot | +| `sudo systemctl disable dstack-vmm` | Disable start on boot | + +### View Logs + +| Command | Description | +|---------|-------------| +| `journalctl -u dstack-vmm` | View all logs | +| `journalctl -u dstack-vmm -n 100` | View last 100 lines | +| `journalctl -u dstack-vmm -f` | Follow logs in real-time | +| `journalctl -u dstack-vmm --since "1 hour ago"` | Logs from last hour | +| `journalctl -u dstack-vmm -p err` | Show only errors | + +--- + +## Manual Setup + +If you prefer to set up the service manually, follow these steps. + +### Step 1: Create the Systemd Service File + +```bash +sudo tee /etc/systemd/system/dstack-vmm.service > /dev/null <<'EOF' +[Unit] +Description=dstack Virtual Machine Monitor +Documentation=https://dstack.org +After=network.target + +[Service] +Type=simple +User=root +ExecStart=/usr/local/bin/dstack-vmm --config /etc/dstack/vmm.toml serve +Restart=always +RestartSec=5 +StandardOutput=journal +StandardError=journal + +# Resource limits for handling many concurrent VMs +LimitNOFILE=65536 +LimitNPROC=4096 + +# Security hardening +NoNewPrivileges=false +ProtectSystem=strict +RuntimeDirectory=dstack +ReadWritePaths=/var/run/dstack /var/log/dstack /var/lib/dstack /tmp + +[Install] +WantedBy=multi-user.target +EOF +``` + +### Step 2: Reload Systemd and Enable Service + +```bash +sudo systemctl daemon-reload +sudo systemctl enable dstack-vmm +``` + +### Step 3: Start the Service + +```bash +sudo systemctl start dstack-vmm +``` + +### Step 4: Verify Service Status + +```bash +sudo systemctl status dstack-vmm +``` + +Expected output: +``` +● dstack-vmm.service - dstack Virtual Machine Monitor + Loaded: loaded (/etc/systemd/system/dstack-vmm.service; enabled) + Active: active (running) since ... +``` + +### Step 5: Verify VMM is Working + +Check that the HTTP API is responding: + +```bash +curl -s http://127.0.0.1:9080/ | head -5 +``` + +Check that the supervisor socket exists: + +```bash +ls -la /var/run/dstack/supervisor.sock +``` + +--- + +## Service Configuration + +### Service File Explained + +| Setting | Description | +|---------|-------------| +| `Type=simple` | Service runs as a foreground process | +| `User=root` | VMM requires root for VM management | +| `Restart=always` | Automatically restart on failure | +| `RestartSec=5` | Wait 5 seconds before restarting | +| `LimitNOFILE=65536` | Max open file descriptors (for many concurrent VMs) | +| `LimitNPROC=4096` | Max processes/threads | +| `ProtectSystem=strict` | Read-only access to system directories | +| `RuntimeDirectory=dstack` | Creates `/run/dstack` automatically on each boot | +| `ReadWritePaths` | Directories VMM can write to | + +### Environment Variables + +To enable debug logging: + +```bash +sudo tee /etc/systemd/system/dstack-vmm.service.d/environment.conf > /dev/null <<'EOF' +[Service] +Environment="RUST_LOG=debug" +Environment="RUST_BACKTRACE=1" +EOF +sudo systemctl daemon-reload +sudo systemctl restart dstack-vmm +``` + +--- + +## Troubleshooting + +For detailed solutions, see the [dstack Installation Troubleshooting Guide](/tutorial/troubleshooting-dstack-installation#vmm-service-setup-issues): + +- [Service fails to start](/tutorial/troubleshooting-dstack-installation#service-fails-to-start) +- [Service keeps restarting](/tutorial/troubleshooting-dstack-installation#service-keeps-restarting) +- [HTTP API not responding](/tutorial/troubleshooting-dstack-installation#http-api-not-responding) +- [Supervisor socket not created](/tutorial/troubleshooting-dstack-installation#supervisor-socket-not-created) +- [Permission denied errors](/tutorial/troubleshooting-dstack-installation#permission-denied-errors) + +--- + +## Next Steps + +With VMM running as a service, proceed to deploy the Key Management Service: + +- [Contract Deployment](/tutorial/contract-deployment) - Deploy KMS contracts to Sepolia + +## Additional Resources + +- [systemd Documentation](https://www.freedesktop.org/software/systemd/man/systemd.service.html) +- [journalctl Manual](https://www.freedesktop.org/software/systemd/man/journalctl.html) +- [dstack GitHub Repository](https://github.com/Dstack-TEE/dstack) From 25638904811d85316d477efd53013435a52feb5b Mon Sep 17 00:00:00 2001 From: LSDan Date: Thu, 12 Mar 2026 15:29:02 +0000 Subject: [PATCH 2/2] docs: add tutorial README with ordered table of contents Co-Authored-By: Claude Opus 4.6 --- docs/tutorials/README.md | 79 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 docs/tutorials/README.md diff --git a/docs/tutorials/README.md b/docs/tutorials/README.md new file mode 100644 index 00000000..04d3481a --- /dev/null +++ b/docs/tutorials/README.md @@ -0,0 +1,79 @@ +# Self-Host Tutorials + +Step-by-step guides for deploying dstack on your own TDX hardware. These +tutorials walk through the entire process from bare-metal host setup to +running your first confidential application. + +## Tutorial Order + +### 1. Host Setup + +| Step | Tutorial | File | +|------|----------|------| +| 1 | TDX Hardware Verification | [tdx-hardware-verification.md](tdx-hardware-verification.md) | +| 2 | TDX & SGX BIOS Configuration | [tdx-bios-configuration.md](tdx-bios-configuration.md) | +| 3 | TDX Software Installation | [tdx-software-installation.md](tdx-software-installation.md) | +| 4 | TDX & SGX Verification | [tdx-sgx-verification.md](tdx-sgx-verification.md) | + +### 2. Prerequisites + +| Step | Tutorial | File | +|------|----------|------| +| 1 | DNS Configuration | [dns-configuration.md](dns-configuration.md) | +| 2 | SSL Certificate Setup | [ssl-certificate-setup.md](ssl-certificate-setup.md) | +| 3 | Docker Setup | [docker-setup.md](docker-setup.md) | +| 3 | HAProxy Setup | [haproxy-setup.md](haproxy-setup.md) | +| 4 | Gramine Key Provider | [gramine-key-provider.md](gramine-key-provider.md) | +| 4 | Local Docker Registry | [local-docker-registry.md](local-docker-registry.md) | +| 5 | Blockchain Wallet Setup | [blockchain-setup.md](blockchain-setup.md) | + +> Steps 3–4 contain parallel tracks. Docker Setup and HAProxy Setup are +> both step 3; Gramine Key Provider and Local Docker Registry are both +> step 4. Complete both within each step number. + +### 3. dstack Installation + +| Step | Tutorial | File | +|------|----------|------| +| 1 | System Baseline & Dependencies | [system-baseline-dependencies.md](system-baseline-dependencies.md) | +| 2 | Rust Toolchain Installation | [rust-toolchain-installation.md](rust-toolchain-installation.md) | +| 3 | Clone & Build dstack-vmm | [clone-build-dstack-vmm.md](clone-build-dstack-vmm.md) | +| 4 | VMM Configuration | [vmm-configuration.md](vmm-configuration.md) | +| 5 | VMM Service Setup | [vmm-service-setup.md](vmm-service-setup.md) | +| 6 | Management Interface Setup | [management-interface-setup.md](management-interface-setup.md) | +| 7 | Guest OS Image Setup | [guest-image-setup.md](guest-image-setup.md) | + +### 4. KMS Deployment + +| Step | Tutorial | File | +|------|----------|------| +| 1 | Contract Deployment | [contract-deployment.md](contract-deployment.md) | +| 2 | KMS Build & Configuration | [kms-build-configuration.md](kms-build-configuration.md) | +| 3 | KMS CVM Deployment | [kms-cvm-deployment.md](kms-cvm-deployment.md) | + +### 5. Gateway Deployment + +| Step | Tutorial | File | +|------|----------|------| +| 1 | Gateway CVM Preparation | [gateway-build-configuration.md](gateway-build-configuration.md) | +| 2 | Gateway CVM Deployment | [gateway-service-setup.md](gateway-service-setup.md) | + +### 6. First Application + +| Step | Tutorial | File | +|------|----------|------| +| 1 | Hello World Application | [hello-world-app.md](hello-world-app.md) | +| 2 | Attestation Verification | [attestation-verification.md](attestation-verification.md) | + +### Troubleshooting + +These guides are not part of the main flow. Refer to them as needed. + +| Tutorial | File | +|----------|------| +| Troubleshooting: Prerequisites | [troubleshooting-prerequisites.md](troubleshooting-prerequisites.md) | +| Troubleshooting: Host Setup | [troubleshooting-host-setup.md](troubleshooting-host-setup.md) | +| Troubleshooting: dstack Installation | [troubleshooting-dstack-installation.md](troubleshooting-dstack-installation.md) | +| Troubleshooting: KMS Deployment | [troubleshooting-kms-deployment.md](troubleshooting-kms-deployment.md) | +| Troubleshooting: Gateway Deployment | [troubleshooting-gateway-deployment.md](troubleshooting-gateway-deployment.md) | +| Troubleshooting: First Application | [troubleshooting-first-application.md](troubleshooting-first-application.md) |