docs: progressive learning structure with per-executor guides and architecture reference by ko3n1g · Pull Request #467 · NVIDIA-NeMo/Run

ko3n1g · 2026-03-16T13:16:35Z

Summary

Adds quickstart.md — a 5-minute working example using run.Script + LocalExecutor, no cluster or SSH required
Adds executors/ directory with seven per-executor guides (local, docker, slurm, skypilot, dgxcloud, lepton, kuberay), each following a consistent structure: prerequisites → annotated config → e2e workflow → advanced options
Adds architecture.md — internals reference covering the Experiment call chain, Executor→TorchX scheduler mapping, metadata storage layout, and contributor steps for adding a new executor
Refactors execution.md — removes the per-executor sections (now in executors/); keeps packager/launcher reference; adds links to executors/ and architecture.md
Extends management.md — adds a "Putting it all together" e2e section at the end that ties config + remote executor + Experiment.from_id together
Improves ray.md — adds a "When to use Ray vs. standard execution" decision table, per-backend prerequisites, and cross-links to the relevant executor guide before each quick-start section; marks CustomJobDetails as an advanced pattern
Reorders index.md toctree to match the progressive learning path: quickstart → configuration → execution → executors/ → management → cli → ray → architecture

Guide ordering rationale

Each level deliberately avoids introducing concepts from the next:

quickstart — LocalExecutor only, no packager/SSH
executors/ — ordered by setup cost (local → docker → slurm → skypilot → dgxcloud → lepton → kuberay)
management "putting it all together" references a remote executor so it lands after executors/
architecture is last — internals make sense only once the user-facing model is understood

Test plan

sphinx-build docs/ _build/html — all toctree references resolve with no warnings
Walk each new e2e snippet to confirm it is syntactically correct and runnable
Check cross-links between executors/, ray.md, and architecture.md are consistent

🤖 Generated with Claude Code

…hitecture reference Restructures the guides into a layered learning path so users get something working first and deepen understanding step by step: - Add quickstart.md: 5-minute local run using run.Script + LocalExecutor - Add executors/ directory with per-executor guides (local, docker, slurm, skypilot, dgxcloud, lepton, kuberay), each with prerequisites, annotated config, and an end-to-end workflow - Add architecture.md: Experiment call chain, Executor→TorchX scheduler mapping, metadata layout, and contributor steps for adding a new executor - Update execution.md: remove per-executor sections (now in executors/); add links to executors/ and architecture.md - Update management.md: add "Putting it all together" e2e section - Update ray.md: add "When to use Ray" decision table, per-backend prerequisites, and cross-links to executor guides before each quick-start - Update index.md: reorder toctree to match the learning path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g temporarily deployed to public March 16, 2026 13:18 — with GitHub Actions Inactive

ko3n1g requested review from hemildesai and roclark March 16, 2026 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: progressive learning structure with per-executor guides and architecture reference#467

docs: progressive learning structure with per-executor guides and architecture reference#467
ko3n1g wants to merge 1 commit intomainfrom
docs/progressive-learning-structure

ko3n1g commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ko3n1g commented Mar 16, 2026

Summary

Guide ordering rationale

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant