Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ This project implements Yocto layer and the overall build scripts for dstack Bas

See https://github.com/Phala-Network/dstack-cloud for more details.

## CoCo/Kata Kubernetes smoke test

After building the dstack guest rootfs with CoCo guest components, see
[`docs/coco-k8s-testing.md`](docs/coco-k8s-testing.md) for a Kata TDX/Kubernetes
smoke-test workflow.

## Reproducible Build The Guest Image

### Pre-requisites
Expand Down
287 changes: 287 additions & 0 deletions docs/coco-k8s-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
# Testing the dstack CoCo guest image on Kubernetes

This guide smoke-tests the dstack rootfs as a Kata/CoCo CVM guest image on a
Kubernetes node. It is intended for the first-cut CoCo integration where the
native dstack services and the CoCo guest components run independently in the
same image.

The commands below assume a local Yocto build tree and a single-node test host.
Adjust paths and the `RuntimeClass` name if your Kata installation differs.

## Prerequisites

- A host with Kubernetes and Kata Containers installed.
- A confidential Kata runtime class, for example `kata-qemu-tdx`.
- The node is labelled for Kata scheduling, for example:

```bash
kubectl get runtimeclass
kubectl get nodes --show-labels | grep katacontainers.io/kata-runtime=true
```

- The host can write to the Kata image/config directories, typically:

```text
/opt/kata/share/kata-containers/
/opt/kata/share/defaults/kata-containers/configuration-qemu-tdx.toml
```

> **Warning:** the test edits the host Kata runtime config. Use a dedicated test
> machine or keep a backup and restore it when finished.

## Build the guest rootfs

```bash
source openembedded-core/oe-init-build-env bb-build >/dev/null
bitbake dstack-rootfs
```

The build should produce:

```text
bb-build/tmp/deploy/images/dstack/dstack-rootfs-dstack.cpio
bb-build/tmp/deploy/images/dstack/bzImage
```

Before building a disk image, verify that the CoCo/Kata pieces are present in
the rootfs work directory:

```bash
ROOT=bb-build/tmp/work/dstack-poky-linux/dstack-rootfs/1.0/rootfs

test -x "$ROOT/usr/bin/kata-agent"
test -f "$ROOT/etc/kata-opa/default-policy.rego"
test -f "$ROOT/etc/ocicrypt_config.json"
test -x "$ROOT/pause_bundle/rootfs/pause"
```

The default policy file is required because the `agent-policy` feature
initializes OPA before initdata is parsed. The pause bundle is required by
Kata's confidential/guest-pull sandbox path.

## Create a Kata disk image from the built cpio

```bash
IMG=/opt/kata/share/kata-containers/dstack-coco-mvp.ext4
KERNEL=/opt/kata/share/kata-containers/vmlinuz-dstack-coco-mvp
CPIO=$(readlink -f bb-build/tmp/deploy/images/dstack/dstack-rootfs-dstack.cpio)
BZIMAGE=$(readlink -f bb-build/tmp/deploy/images/dstack/bzImage)
MNT=/tmp/dstack-coco-mvp-root

sudo rm -f "${IMG}.tmp"
sudo truncate -s 3G "${IMG}.tmp"
printf 'label: dos\nunit: sectors\n\nstart=6144, type=83, bootable\n' | sudo sfdisk "${IMG}.tmp"

LOOP=$(sudo losetup --find --show --partscan "${IMG}.tmp")
sleep 1
sudo mkfs.ext4 -F "${LOOP}p1"
sudo mkdir -p "$MNT"
sudo mount "${LOOP}p1" "$MNT"

sudo bash -c "cd '$MNT' && cpio -idmu --no-absolute-filenames < '$CPIO'"
sudo sync
sudo umount "$MNT"
sudo losetup -d "$LOOP"

sudo mv -f "${IMG}.tmp" "$IMG"
sudo cp -a "$BZIMAGE" "$KERNEL"
```

If your system does not create `${LOOP}p1`, detach the loop device and use an
explicit offset mount/mkfs flow instead.

## Point Kata TDX at the dstack image

Back up the current config first:

```bash
KATA_CFG=/opt/kata/share/defaults/kata-containers/configuration-qemu-tdx.toml
sudo cp -a "$KATA_CFG" "${KATA_CFG}.bak.$(date +%Y%m%d%H%M%S)"
```

Set the kernel/image/rootfs and make sure the agent starts the CoCo guest
components:

```bash
sudo sed -i \
-e 's#^kernel = ".*"#kernel = "/opt/kata/share/kata-containers/vmlinuz-dstack-coco-mvp"#' \
-e 's#^image = ".*"#image = "/opt/kata/share/kata-containers/dstack-coco-mvp.ext4"#' \
-e 's#^rootfs_type=.*#rootfs_type="ext4"#' \
-e 's#^default_memory = .*#default_memory = 4096#' \
"$KATA_CFG"

# Add these to kernel_params if they are not already present:
# cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1
# systemd.unit=kata-containers.target
# agent.log=debug
# agent.guest_components_procs=api-server-rest
# agent.guest_components_rest_api=all
```

Kata normally reads this config when a new sandbox starts. If the runtime has a
stale config, restart containerd/kubelet on the test node.

## Create initdata

For a smoke test, use an allow-all policy and offline CDH config:

```bash
cat >/tmp/dstack-coco-mvp-initdata.toml <<'EOF_INITDATA'
version = "0.1.0"
algorithm = "sha256"

[data]
"aa.toml" = '''
[eventlog_config]
init_pcr = 17
enable_eventlog = false

[log]
level = "debug"
'''

"cdh.toml" = '''
socket = "unix:///run/confidential-containers/cdh.sock"

[kbc]
name = "offline_fs_kbc"
url = ""

[log]
level = "debug"
'''

"policy.rego" = '''
package agent_policy

default AddARPNeighborsRequest := true
default AddSwapRequest := true
default CloseStdinRequest := true
default CopyFileRequest := true
default CreateContainerRequest := true
default CreateSandboxRequest := true
default DestroySandboxRequest := true
default ExecProcessRequest := true
default GetDiagnosticDataRequest := true
default GetMetricsRequest := true
default GetOOMEventRequest := true
default GuestDetailsRequest := true
default ListInterfacesRequest := true
default ListRoutesRequest := true
default MemAgentCompactConfig := true
default MemAgentMemcgConfig := true
default MemHotplugByProbeRequest := true
default OnlineCPUMemRequest := true
default PauseContainerRequest := true
default PullImageRequest := true
default ReadStreamRequest := true
default RemoveContainerRequest := true
default RemoveStaleVirtiofsShareMountsRequest := true
default ReseedRandomDevRequest := true
default ResumeContainerRequest := true
default SetGuestDateTimeRequest := true
default SetPolicyRequest := true
default SignalProcessRequest := true
default StartContainerRequest := true
default StartTracingRequest := true
default StatsContainerRequest := true
default StopTracingRequest := true
default TtyWinResizeRequest := true
default UpdateContainerRequest := true
default UpdateEphemeralMountsRequest := true
default UpdateInterfaceRequest := true
default UpdateRoutesRequest := true
default WaitProcessRequest := true
default WriteStreamRequest := true
'''
EOF_INITDATA

INITDATA_B64=$(gzip -c /tmp/dstack-coco-mvp-initdata.toml | base64 -w0)
```

For a KBS-backed run, change `cdh.toml` to select `cc_kbc` and set the KBS URL,
then regenerate `INITDATA_B64`.

## Deploy the test Pod

```bash
cat >/tmp/dstack-coco-mvp-test.yaml <<EOF_POD
apiVersion: v1
kind: Pod
metadata:
name: dstack-coco-mvp-test
annotations:
io.katacontainers.config.hypervisor.cc_init_data: "${INITDATA_B64}"
io.katacontainers.config.hypervisor.default_memory: "4096"
io.katacontainers.config.hypervisor.kernel_params: >-
cgroup_no_v1=all
systemd.unified_cgroup_hierarchy=1
systemd.unit=kata-containers.target
agent.log=debug
agent.guest_components_procs=api-server-rest
agent.guest_components_rest_api=all
spec:
runtimeClassName: kata-qemu-tdx
restartPolicy: Never
containers:
- name: test
image: docker.io/library/busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh", "-c", "echo hello-from-dstack-coco; uname -a; sleep 300"]
EOF_POD

kubectl apply -f /tmp/dstack-coco-mvp-test.yaml
kubectl get pod dstack-coco-mvp-test -w
```

A successful run reaches `Running`, and the container log shows the dstack guest
kernel:

```bash
kubectl logs dstack-coco-mvp-test
# hello-from-dstack-coco
# Linux dstack-coco-mvp-test 6.18.24-dstack ... x86_64 GNU/Linux
```

You can also confirm the QEMU command line uses the dstack image:

```bash
pgrep -af 'qemu-system.*sandbox' | grep dstack-coco-mvp
```

## Troubleshooting

- `timed out connecting to vsock ...:1024`: check the guest console and make
sure `/etc/kata-opa/default-policy.rego` exists in the image.
- `Pause image not present in rootfs`: check that `/pause_bundle/config.json`
and `/pause_bundle/rootfs/pause` exist in the image.
- `Creating watcher returned error too many open files`: the test host may have
too many stale shims or a low inotify limit. On a dedicated test node:

```bash
sudo sysctl -w fs.inotify.max_user_instances=1024
sudo sysctl -w fs.inotify.max_user_watches=1048576
```

- To remove a stuck test sandbox, first delete the Pod, then check for stale
Kata shims/QEMU processes before killing anything:

```bash
kubectl delete pod dstack-coco-mvp-test --force --grace-period=0 --ignore-not-found
pgrep -af 'containerd-shim-kata|qemu-system.*sandbox'
```

## Cleanup

```bash
kubectl delete pod dstack-coco-mvp-test --force --grace-period=0 --ignore-not-found
```

Restore the backed-up Kata config when done:

```bash
sudo cp -a /path/to/configuration-qemu-tdx.toml.bak.YYYYmmddHHMMSS \
/opt/kata/share/defaults/kata-containers/configuration-qemu-tdx.toml
sudo systemctl restart containerd
sudo systemctl restart kubelet
```
2 changes: 1 addition & 1 deletion dstack
Submodule dstack updated 99 files
+2 −0 .github/workflows/prek-check.yml
+62 −0 Cargo.lock
+8 −0 Cargo.toml
+26 −16 README.md
+21 −0 SECURITY.md
+16 −0 cached-cell/Cargo.toml
+263 −0 cached-cell/src/lib.rs
+70 −0 cc-eventlog/src/tdx.rs
+20 −0 crates/dstack-auth/Cargo.toml
+320 −0 crates/dstack-auth/src/main.rs
+21 −0 crates/dstack-cli-core/Cargo.toml
+37 −0 crates/dstack-cli-core/src/compose.rs
+509 −0 crates/dstack-cli-core/src/config.rs
+102 −0 crates/dstack-cli-core/src/fsutil.rs
+277 −0 crates/dstack-cli-core/src/host.rs
+237 −0 crates/dstack-cli-core/src/layout.rs
+26 −0 crates/dstack-cli-core/src/lib.rs
+93 −0 crates/dstack-cli-core/src/ports.rs
+121 −0 crates/dstack-cli-core/src/vmm.rs
+22 −0 crates/dstack-cli/Cargo.toml
+508 −0 crates/dstack-cli/src/main.rs
+26 −0 crates/dstackup/Cargo.toml
+84 −0 crates/dstackup/src/cid.rs
+235 −0 crates/dstackup/src/cli.rs
+178 −0 crates/dstackup/src/destroy.rs
+718 −0 crates/dstackup/src/image.rs
+1,192 −0 crates/dstackup/src/install.rs
+116 −0 crates/dstackup/src/main.rs
+69 −0 crates/dstackup/src/state.rs
+123 −0 crates/dstackup/src/systemd.rs
+2 −2 docs/amd-sev-snp-review-readiness.md
+7 −76 docs/deployment.md
+3 −3 docs/design-and-hardening-decisions.md
+61 −0 docs/hardware-enablement.md
+147 −0 docs/native-tee-interfaces.md
+305 −0 docs/onboarding.md
+9 −11 docs/security/README.md
+75 −0 docs/security/public-security-reports.md
+17 −0 docs/security/security-best-practices.md
+83 −1 docs/security/security-model.md
+10 −2 docs/tutorials/kms-build-configuration.md
+6 −2 docs/tutorials/kms-cvm-deployment.md
+6 −4 docs/usage.md
+160 −30 dstack-attest/src/attestation.rs
+117 −12 dstack-attest/src/v1.rs
+87 −32 dstack-attest/tests/sev_snp_verify.rs
+2 −3 dstack-mr/cli/src/main.rs
+80 −5 dstack-mr/src/kernel.rs
+17 −34 dstack-mr/src/lib.rs
+1 −1 dstack-mr/src/machine.rs
+98 −6 dstack-mr/src/main.rs
+60 −0 dstack-mr/src/measurement.rs
+242 −106 dstack-mr/src/sev.rs
+209 −148 dstack-mr/src/tdvf.rs
+612 −0 dstack-mr/src/tdx.rs
+0 −135 dstack-mr/src/uefi_var.rs
+2 −0 dstack-types/Cargo.toml
+549 −37 dstack-types/src/lib.rs
+9 −0 examples/hello-nginx/docker-compose.yaml
+1 −0 gateway/Cargo.toml
+7 −0 gateway/src/config.rs
+65 −44 gateway/src/main_service.rs
+134 −0 gateway/src/main_service/handshakes.rs
+1 −0 gateway/src/proxy.rs
+86 −43 gateway/src/proxy/tls_passthough.rs
+191 −0 gateway/test-run/TESTING.md
+14 −8 gateway/test-run/test_suite.sh
+4 −0 guest-agent/rpc/build.rs
+2 −3 guest-agent/rpc/proto/agent_rpc.proto
+11 −7 guest-agent/src/backend.rs
+3 −5 guest-agent/src/rpc_service.rs
+34 −1 http-client/src/lib.rs
+1 −1 kms/auth-eth-bun/package.json
+1 −1 kms/auth-mock/package.json
+1 −1 kms/auth-simple/package.json
+39 −21 kms/src/main_service.rs
+102 −50 kms/src/main_service/amd_attest.rs
+28 −8 kms/src/onboard_service.rs
+281 −0 scripts/install.sh
+6 −4 sdk/curl/api.md
+80 −62 sdk/go/README.md
+17 −23 sdk/js/README.md
+5 −12 sdk/python/README.md
+2 −0 sdk/rust/README.md
+3 −1 sev-snp-qvl/Cargo.toml
+206 −90 sev-snp-qvl/src/lib.rs
+1 −0 verifier/Cargo.toml
+3 −3 verifier/README.md
+3 −0 verifier/fixtures/sev-snp-attestation.json
+37 −0 verifier/fixtures/sev-snp.README.md
+3 −0 verifier/fixtures/tdx-lite-attestation.json
+6 −0 verifier/fixtures/tdx-lite-getquote.json
+65 −0 verifier/fixtures/tdx-lite.README.md
+342 −27 verifier/src/verification.rs
+233 −128 vmm/src/app.rs
+51 −9 vmm/src/app/image.rs
+79 −0 vmm/src/config.rs
+31 −5 vmm/ui/package-lock.json
+6 −0 vmm/vmm.toml
Loading