Skip to content

Generalize deploy runbook: remove Hephaestus-specific dual-daemon policy#2

Open
pdfinn wants to merge 2 commits into
mainfrom
claude/remove-sensitive-readme-config-TZE7t
Open

Generalize deploy runbook: remove Hephaestus-specific dual-daemon policy#2
pdfinn wants to merge 2 commits into
mainfrom
claude/remove-sensitive-readme-config-TZE7t

Conversation

@pdfinn
Copy link
Copy Markdown
Member

@pdfinn pdfinn commented May 15, 2026

Summary

Refactor the SGLang deployment documentation to be hardware-agnostic and remove references to Hephaestus-specific operational constraints. The runbook now targets any Jetson Orin AGX with a single Docker daemon, making it accessible to new contributors and field deployments without requiring knowledge of internal dual-daemon infrastructure.

Key Changes

  • Renamed runbooks/hephaestus-deploy.mdrunbooks/deploy.md to reflect that it now documents the standard field-deployment shape, not a specific internal host
  • Removed §0.1 (dev-daemon setup) — the entire section describing Hephaestus's dual-daemon policy (docker.service + docker-dev.service) is deleted; users now follow a single-daemon path
  • Removed disk-policy constraints — eliminated references to /mnt/orin-ssd as a load-bearing requirement; documentation now uses generic paths like ${HF_CACHE:-/var/lib/huggingface} and ${LOGS:-/var/log/sglang}
  • Simplified path references throughout:
    • Removed Hephaestus-specific mount points (/mnt/orin-ssd/huggingface, /mnt/orin-ssd/pdfinn/)
    • Replaced with environment-variable-based defaults that work on any host
    • Updated all docker --host unix:///run/docker-dev.sock invocations to plain docker
  • Updated SGLANG-ADOPTION-NOTES.md to remove Hephaestus references and use generic "dev box" / "deploy host" terminology
  • Updated CLAUDE.md to clarify that dual-daemon policy is Hephaestus-specific operational detail, not part of the public build/deploy contract
  • Updated README.md and build documentation to remove assumptions about dual daemons and dev-daemon sockets
  • Updated validation script (validate-on-hardware.sh) to use environment variables and generic socket guidance

Notable Details

  • The acceptance gate remains unchanged: "a new contributor can bring SGLang up on a clean Orin AGX in under 15 minutes"
  • All technical content (model launch flags, memory tuning, troubleshooting) is preserved; only host-specific operational details are removed
  • References to internal workloads (TAK, NERVA) are replaced with generic "other resident workloads" language
  • The runbook now explicitly states it covers the "field-deployment shape" as the primary path, making the scope clear upfront
  • Disk-space verification steps remain but no longer reference specific internal partitions

This change aligns with the repo's public-facing nature (per CLAUDE.md) and removes internal hostname/workload references that shouldn't appear in public documentation.

https://claude.ai/code/session_01QTQBMe39u2tE33QriAMLgo

claude added 2 commits May 15, 2026 08:17
The README named the on-hardware validation box, its `/mnt/orin-ssd`
layout, and the confidential workloads colocated on it. This repo is
public — none of that belongs here. Removes the "Hephaestus disk
policy" section in full and genericizes the CI / working-notes
mentions.
…ublic tree

This repo is public. Several files named an internal dev-host hostname,
its filesystem layout, and confidential workload names colocated on it.
None of that belongs in a public-facing serving repo intended for
external users.

Changes:
- Rename runbooks/hephaestus-deploy.md -> runbooks/deploy.md and rewrite
  as a generic Jetson Orin AGX deployment guide. Drop the entire
  dual-Docker-daemon section (that's per-environment internal policy).
- CLAUDE.md: drop the "dual-purpose model" section and table that named
  confidential workloads; genericize remaining host references.
- docs/SGLANG-ADOPTION-NOTES.md: replace dev-box hostname and
  /mnt/orin-ssd paths with generic placeholders; replace the
  "honouring the disk policy" subsection that named confidential
  workloads with a non-specific "daemonless extraction" rationale.
- runbooks/lucibridge-routing.md, sglang/orin/{README.md,Dockerfile,
  bake-tokenizers.sh,config.py,validate-on-hardware.sh},
  sglang/thor/Dockerfile: replace remaining hostname / SSD-path /
  internal-runbook references with generic equivalents.

Verified: `git ls-files | xargs grep -iE 'NERVA|\bTAK\b|hephaestus|
orin-ssd'` returns no matches.

Note: git history still contains all of these references in earlier
commits. A history rewrite is a separate operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants