Skip to content

Add support for setting NIXL num threads for vLLM CLI#809

Merged
amaslenn merged 8 commits intomainfrom
am/vllm-nthreads
Feb 17, 2026
Merged

Add support for setting NIXL num threads for vLLM CLI#809
amaslenn merged 8 commits intomainfrom
am/vllm-nthreads

Conversation

@amaslenn
Copy link
Contributor

Summary

New field nixl_threads was added to pass the number to --kv-transfer-config '{"kv_connector": "NixlConnector", "kv_role": "...", "kv_connector_extra_config": {"num_threads": <VALUE>}}'.

Additionally, added VllmArgs class to documentation with correct rendering for Pydantic model.

Test Plan

  1. CI
  2. Manual doc review.

Additional Notes

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

Warning

Rate limit exceeded

@amaslenn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Added Pydantic autodoc to docs, enabled new autodoc_pydantic config flags, introduced nixl_threads to VllmArgs, refactored vLLM Slurm command generation to produce per-branch commands with kv-transfer-config (including optional threading), and reorganized/expanded vLLM tests.

Changes

Cohort / File(s) Summary
Documentation & build deps
doc/conf.py, pyproject.toml, doc/workloads/vllm.rst
Enabled sphinxcontrib.autodoc_pydantic in Sphinx config, removed "undoc-members" from autodoc_default_options, added three autodoc_pydantic_model_* = False flags, added autodoc-pydantic~=2.2 to extras, and inserted an API docs subsection referencing VllmArgs.
Vllm argument model
src/cloudai/workloads/vllm/vllm.py
Added `nixl_threads: int
vLLM Slurm command generation
src/cloudai/workloads/vllm/slurm_command_gen_strategy.py
Added helper _to_json_str_arg(config: dict) -> str; refactored get_vllm_serve_commands to build commands via a loop for prefill/decode branches, unify kv_transfer_config construction, inject per-branch kv_role, optionally set kv_connector_extra_config.num_threads when nixl_threads provided, and return accumulated command list.
Tests — vLLM command strategy
tests/workloads/vllm/test_command_gen_strategy_slurm.py
Reorganized tests into classes (TestVllmServeCommand, TestVllmBenchCommand, TestVllmAggregatedMode, etc.), added tests for container mounts and sweep detection, introduced parameterized tests for nixl_threads behavior, and updated expectations to reflect two serve commands when threading is configured.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble configs, tidy every thread,
I teach the docs what Pydantic said,
Loops hum where hardcoded lines once lay,
Tests hop in order, checking the way,
A tiny rabbit cheers the build today!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately summarizes the main change: adding support for setting NIXL num threads for the vLLM CLI, which is the core functionality introduced across all modified files.
Description check ✅ Passed The description clearly relates to the changeset by explaining the new nixl_threads field implementation and documentation additions for VllmArgs, matching the actual changes made.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch am/vllm-nthreads

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Added nixl_threads field to VllmArgs to configure the number of threads for NIXL connector in vLLM disaggregated prefill/decode mode. The field is properly excluded from CLI arguments and instead embedded into the --kv-transfer-config JSON structure as kv_connector_extra_config.num_threads. Additionally, added Pydantic model documentation rendering support using autodoc-pydantic.

Key changes:

  • New nixl_threads field in VllmArgs can be set independently for prefill and decode instances
  • Refactored command generation to build kv_transfer_config fresh for each iteration (fixes potential state mutation bug)
  • Added helper method _to_json_str_arg() for consistent JSON serialization
  • Comprehensive test coverage with parametrized tests verifying all combinations of nixl_threads settings
  • Tests verify that --nixl-threads never appears as a CLI flag (only in JSON config)
  • Documentation improvements with autodoc-pydantic extension for better Pydantic model rendering

Confidence Score: 5/5

  • This PR is safe to merge with no issues found
  • All previously identified issues have been addressed. The code correctly handles the nixl_threads parameter, builds config dictionaries fresh for each iteration, includes comprehensive test coverage with multiple scenarios, and the documentation changes are straightforward. The implementation is clean and follows best practices.
  • No files require special attention

Important Files Changed

Filename Overview
src/cloudai/workloads/vllm/vllm.py Added nixl_threads field to VllmArgs with proper exclusion from serve_args property, includes documentation
src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Refactored to build kv_transfer_config per iteration, correctly handles nixl_threads parameter, includes helper method _to_json_str_arg
tests/workloads/vllm/test_command_gen_strategy_slurm.py Added comprehensive parametrized tests for nixl_threads, verifies CLI argument exclusion and JSON config inclusion, reorganized test classes

Sequence Diagram

sequenceDiagram
    participant User
    participant VllmTestDefinition
    participant VllmSlurmCommandGenStrategy
    participant vLLM Prefill
    participant vLLM Decode

    User->>VllmTestDefinition: Configure nixl_threads for prefill/decode
    User->>VllmSlurmCommandGenStrategy: Call get_vllm_serve_commands()
    VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Loop: prefill, decode
    VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Build kv_transfer_config dict
    alt nixl_threads is set
        VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Add kv_connector_extra_config with num_threads
    end
    VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Serialize config to JSON string
    VllmSlurmCommandGenStrategy->>vLLM Prefill: vllm serve --kv-transfer-config '{...,"num_threads":N}'
    VllmSlurmCommandGenStrategy->>vLLM Decode: vllm serve --kv-transfer-config '{...,"num_threads":M}'
    vLLM Prefill-->>User: Start with N threads
    vLLM Decode-->>User: Start with M threads
Loading

Last reviewed commit: dcd70bd

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Additional Comments (1)

src/cloudai/workloads/vllm/vllm.py
nixl_threads should be excluded from serve_args since it's used only to construct the --kv-transfer-config JSON in the command generator, not as a separate CLI argument

        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/workloads/vllm/vllm.py (1)

47-53: ⚠️ Potential issue | 🔴 Critical

Bug confirmed: nixl_threads must be excluded from serve_args.

When nixl_threads is set, serve_args will emit --nixl-threads <value> — an invalid vLLM CLI argument. The field is only meant for internal use in kv_connector_extra_config.num_threads (handled separately in slurm_command_gen_strategy.py). It must be excluded from model_dump() in the same way gpu_ids is.

Existing tests check that kv_connector_extra_config is properly constructed but do not verify that --nixl-threads does not appear as a spurious CLI flag, leaving this bug undetected.

🐛 Proposed fix
-        for k, v in self.model_dump(exclude={"gpu_ids"}, exclude_none=True).items():
+        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():
🤖 Fix all issues with AI agents
In `@doc/workloads/vllm.rst`:
- Around line 135-139: The section heading contains a typo — "vLLm Serve
Arguments" uses a lowercase "m"; update the heading text to "vLLM Serve
Arguments" so it matches the project's capitalization, ensuring consistency with
the surrounding content (the heading string to change is "vLLm Serve Arguments"
above the autopydantic_model block for cloudai.workloads.vllm.vllm.VllmArgs).

In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 76-97: The shared dict kv_transfer_config is mutated inside the
loop so prefill's nixl_threads (via setdefault and kv_connector_extra_config)
leaks into the subsequent decode iteration; fix by constructing a fresh
per-iteration config (e.g., copy or create new dict) before adding kv_role and
potential kv_connector_extra_config, then pass that per-iteration dict into
self._to_json_str_arg when building commands in the loop that appends to
commands for prefill/decode using args.nixl_threads and args.serve_args.

In `@tests/workloads/vllm/test_command_gen_strategy_slurm.py`:
- Around line 183-217: Add a test that ensures prefill's nixl_threads doesn't
leak into the decode command: create a new test (e.g.,
test_prefill_nixl_threads_does_not_leak_to_decode) that uses
VllmSlurmCommandGenStrategy and VllmArgs to set
tdef.cmd_args.prefill.nixl_threads=8 and tdef.cmd_args.decode.nixl_threads=None,
call VllmSlurmCommandGenStrategy.get_vllm_serve_commands(), assert two commands
are returned, assert the prefill command string (commands[0]) contains
'"num_threads":8', and assert the decode command string (commands[1]) does NOT
contain "num_threads" or "kv_connector_extra_config" to catch shared-state
mutation between prefill and decode.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Additional Comments (1)

src/cloudai/workloads/vllm/vllm.py
nixl_threads needs to be excluded from serve_args since it's handled specially in the kv_transfer_config. Without excluding it, the field will be converted to --nixl-threads CLI arg AND included in the config JSON, causing duplicate/incorrect arguments.

        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 81-84: kv_transfer_config is newly created each iteration so
calling kv_transfer_config.setdefault("kv_connector_extra_config", {}) is
redundant; instead, when args.nixl_threads is not None, directly assign the
extra config dict (e.g., kv_transfer_config["kv_connector_extra_config"] =
{"num_threads": cast(int, args.nixl_threads)}) so you remove the unnecessary
setdefault and make intent explicit in slurm_command_gen_strategy.py around the
kv_transfer_config construction.

In `@tests/workloads/vllm/test_command_gen_strategy_slurm.py`:
- Around line 193-219: The test function test_decode_nixl_threads is misnamed
because it asserts on both prefill and decode nixl_threads; rename the test
(e.g., to test_nixl_threads or test_serve_commands_nixl_threads) to reflect that
it validates both commands, update the function name where it's referenced
(including any test parametrization or fixtures that mention
test_decode_nixl_threads) and adjust any test identifiers in decorators or test
runners to use the new name so discovery and reporting reflect the correct
intent; leave the body and assertions (references to VllmSlurmCommandGenStrategy
and VllmTestDefinition) unchanged.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Additional Comments (1)

src/cloudai/workloads/vllm/vllm.py
nixl_threads should be excluded from the model dump just like gpu_ids, otherwise it will be incorrectly passed as --nixl-threads CLI argument in addition to being included in the --kv-transfer-config JSON

        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@amaslenn amaslenn requested a review from podkidyshev February 13, 2026 16:06
@amaslenn amaslenn merged commit fba16bc into main Feb 17, 2026
5 checks passed
@amaslenn amaslenn deleted the am/vllm-nthreads branch February 17, 2026 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants