Skip to content

docs: progressive learning structure with per-executor guides and architecture reference#467

Open
ko3n1g wants to merge 1 commit intomainfrom
docs/progressive-learning-structure
Open

docs: progressive learning structure with per-executor guides and architecture reference#467
ko3n1g wants to merge 1 commit intomainfrom
docs/progressive-learning-structure

Conversation

@ko3n1g
Copy link
Contributor

@ko3n1g ko3n1g commented Mar 16, 2026

Summary

  • Adds quickstart.md — a 5-minute working example using run.Script + LocalExecutor, no cluster or SSH required
  • Adds executors/ directory with seven per-executor guides (local, docker, slurm, skypilot, dgxcloud, lepton, kuberay), each following a consistent structure: prerequisites → annotated config → e2e workflow → advanced options
  • Adds architecture.md — internals reference covering the Experiment call chain, Executor→TorchX scheduler mapping, metadata storage layout, and contributor steps for adding a new executor
  • Refactors execution.md — removes the per-executor sections (now in executors/); keeps packager/launcher reference; adds links to executors/ and architecture.md
  • Extends management.md — adds a "Putting it all together" e2e section at the end that ties config + remote executor + Experiment.from_id together
  • Improves ray.md — adds a "When to use Ray vs. standard execution" decision table, per-backend prerequisites, and cross-links to the relevant executor guide before each quick-start section; marks CustomJobDetails as an advanced pattern
  • Reorders index.md toctree to match the progressive learning path: quickstart → configuration → execution → executors/ → management → cli → ray → architecture

Guide ordering rationale

Each level deliberately avoids introducing concepts from the next:

  • quickstartLocalExecutor only, no packager/SSH
  • executors/ — ordered by setup cost (local → docker → slurm → skypilot → dgxcloud → lepton → kuberay)
  • management "putting it all together" references a remote executor so it lands after executors/
  • architecture is last — internals make sense only once the user-facing model is understood

Test plan

  • sphinx-build docs/ _build/html — all toctree references resolve with no warnings
  • Walk each new e2e snippet to confirm it is syntactically correct and runnable
  • Check cross-links between executors/, ray.md, and architecture.md are consistent

🤖 Generated with Claude Code

…hitecture reference

Restructures the guides into a layered learning path so users get
something working first and deepen understanding step by step:

- Add quickstart.md: 5-minute local run using run.Script + LocalExecutor
- Add executors/ directory with per-executor guides (local, docker, slurm,
  skypilot, dgxcloud, lepton, kuberay), each with prerequisites, annotated
  config, and an end-to-end workflow
- Add architecture.md: Experiment call chain, Executor→TorchX scheduler
  mapping, metadata layout, and contributor steps for adding a new executor
- Update execution.md: remove per-executor sections (now in executors/);
  add links to executors/ and architecture.md
- Update management.md: add "Putting it all together" e2e section
- Update ray.md: add "When to use Ray" decision table, per-backend
  prerequisites, and cross-links to executor guides before each quick-start
- Update index.md: reorder toctree to match the learning path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant