JobGroup.SUPPORTED_EXECUTORS is closed to downstream Executor subclasses

## Use case

nemo-skills' `RayExecutor` is a downstream `Executor` subclass that needs to be passable to `JobGroup`. Today `JobGroup.__post_init__` (around `nemo_run/run/job.py:265`) asserts `isinstance(self.executor, JobGroup.SUPPORTED_EXECUTORS)` where `SUPPORTED_EXECUTORS = [SlurmExecutor, DockerExecutor, LocalExecutor]`. Any downstream package adding a new Executor type — Ray, Kubernetes, etc. — hits this assertion and cannot construct a JobGroup.

## Why this matters now

nemo-skills now ships `RayExecutor` and uses `JobGroup` for multi-script eval-generation flows (vLLM + sandbox + client co-located). The Ray multi-script path is a separate architectural concern (single Ray submission = single container, vs. the heterogeneous group semantics JobGroup was designed for on Slurm) — but the immediate question is whether downstream Executor subclasses are even a supported extension point.

## Current workaround

(Marking this clearly as workaround, not a proposed PR.) A downstream project patches the assertion at runtime via a class-name string sniff (\`type(self.executor).__name__ == \"RayExecutor\"\`) to avoid a circular import. This sniff is intentionally narrow but a class-name string match is not idiomatic for upstream.

## Proposed designs

Please indicate preference before we send a PR.

### 1. \`SUPPORTED_EXECUTORS\` extension hook

Downstream packages register their Executor subclass at import time:

\`\`\`python
from nemo_run.run.job import JobGroup
JobGroup.SUPPORTED_EXECUTORS = (*JobGroup.SUPPORTED_EXECUTORS, RayExecutor)
\`\`\`

- **Pro**: explicit, discoverable.
- **Con**: requires downstream packages to mutate a class attribute, which feels brittle.

### 2. Sentinel attribute on the Executor subclass

Downstream marks compatibility:

\`\`\`python
class RayExecutor(Executor):
    _jobgroup_compatible = True
\`\`\`

…and \`__post_init__\` checks \`getattr(executor, \"_jobgroup_compatible\", False)\` in addition to \`isinstance(SUPPORTED_EXECUTORS)\`.

- **Pro**: no mutation of upstream state.
- **Con**: requires JobGroup to know about the sentinel.

### 3. \`Executor.supports_job_group()\` classmethod on the base

Defaults to \`False\`, overridable downstream. Same shape as option 2 but more discoverable in IDEs.

## Note on JobGroup.launch path

Even with the assertion relaxed, \`JobGroup.launch\` calls \`nemo_run.run.torchx_backend.launcher.launch(executor=...)\` which routes through \`EXECUTOR_MAPPING\` in \`torchx_backend/schedulers/api.py:30\`. That mapping has no Ray entry, so \`get_executor_str(RayExecutor)\` raises \`KeyError\`. This means the assertion relax is necessary but not sufficient for Ray multi-script JobGroup to actually launch end-to-end. The pragmatic answer for the multi-script case is multi-pool architecture (pre-host components in separate Ray submissions; collapse multi-script to single-script), but the assertion remains too strict in principle for any downstream Executor subclass.

## Reference

Prior PR #410 was the last touch on \`nemo_run/run/job.py\`. Searching closed issues for "Unsupported executor type" returned 0 hits, so this is a fresh report.

## Ask

Which of the three designs (or a fourth) would you accept as a PR? Happy to send code once direction is confirmed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JobGroup.SUPPORTED_EXECUTORS is closed to downstream Executor subclasses #537

Use case

Why this matters now

Current workaround

Proposed designs

1. `SUPPORTED_EXECUTORS` extension hook

2. Sentinel attribute on the Executor subclass

3. `Executor.supports_job_group()` classmethod on the base

Note on JobGroup.launch path

Reference

Ask

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

JobGroup.SUPPORTED_EXECUTORS is closed to downstream Executor subclasses #537

Description

Use case

Why this matters now

Current workaround

Proposed designs

1. `SUPPORTED_EXECUTORS` extension hook

2. Sentinel attribute on the Executor subclass

3. `Executor.supports_job_group()` classmethod on the base

Note on JobGroup.launch path

Reference

Ask

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions