Bump vllm from 0.6.3.post1 to 0.22.0 in /experiments/agentcompany/openhands by dependabot[bot] · Pull Request #34 · scaleapi/lhaw

dependabot · 2026-06-10T17:15:47Z

Bumps vllm from 0.6.3.post1 to 0.22.0.

Release notes

v0.22.0

Highlights

This release features 459 commits from 230 contributors (63 new)!

DeepSeek V4 maturity: DeepSeek V4 received a major hardening pass this cycle — the model was reorganized into a dedicated vllm/models/deepseek_v4/ package (#43004, #43039, #43073, #43077, #43149), gained NVFP4 fused MoE support (#42209), full + piecewise CUDA graph (#42604), and MTP speculative decoding (#43385). A large set of fused kernels (MegaMoE, mhc, Q-norm, indexer, sparse MLA) and ROCm parity fixes landed alongside accuracy fixes (#42810, #43710).

Model Runner V2 advances toward default: MRv2 is now default for Qwen3 dense models. vLLM will fall back to MRv1 for features that aren't yet supported in MRv2 (#39337). sleep-mode weight reload (#42673), update_config (#42783), and shared KV-cache layers (#35045), plus many correctness fixes.

Experimental Rust frontend: A new Rust front-end integration landed (#40848), with the implementation moved into the tree (#43283) and a DP Supervisor for data-parallel serving (#40841).

Batch invariance, faster: Batch-invariant inference gained Cutlass FP8 support for a 28.9% end-to-end latency improvement (#40408), compile-mode support on SM80 (#42456), and an NVFP4 Cutlass linear path (#39912).

Multi-tier KV cache offloading: A new multi-tier KV cache offloading framework (#40020) with a Python filesystem secondary tier (#41735), DSv4 support (#43142), and Mooncake disk offloading (#42689) extends offloading beyond CPU memory.

Model Support

New architectures: MiniCPM-V 4.6 (#41254), InternS2 Preview (#42705), OpenVLA (#42654), MolmoWeb hf_overrides docs (#42163); EXAONE-4.5 aligned with Transformers update (#42246).

Speculative decoding: custom callable proposer backend (#39487), post-norm EAGLE-3 speculators (#42764), peagle speculators (#41826), hybrid-attention models in extract_hidden_states (#39949), non-MTP speculation for NemotronH (#43130), shared MTP weights in MRv2 (#42538).

DeepSeek V4: NVFP4 MoE (#42209), CUDA graph full/piecewise (#42604), MTP (#43385), model package refactor (#43004, #43039, #43073, #43077), sparse MLA + compressor refactor (#43149, #43710), MegaMoE input-prep kernel move (#43632).

Qwen3.5/3.6: GDN output-projection flatten (#42311), GatedDeltaNet Marlin TP≥2 fix (#36329), ViT full CUDA graph (#42151), runai-streamer weight loading for Qwen3.5/MTP/Qwen3-VL (#42521, #42716), KDA chunk-prefill exp2 semantics (#43195).

Gemma3/Gemma4: mixed-resolution image co-batching crash fix (#42217), MoE routing closure fix (#42250), tool-parser float-corruption fix (#42128), batched vision encoder for image/video (#43169), multi-GPU fix (#42630).

Kimi-K2.5: skip vision-tower dtype conversion under quantization (#42869), mm_projector dtype fix (#42081).

Cohere: enable Cohere MoE (#43143), pipeline parallelism for Cohere vision (#42819).

Tool calling: Apertus tool parser (#41154), Qwen3Coder anyOf/oneOf/$ref resolution re-land (#37831), shared coerce_to_schema_type across MiniMax-M2 / DeepSeek-V3.2 / Seed-OSS parsers (#43006, #43019, #43140).

ViT CUDA graph: Qwen2-VL (#41736), Step3-VL encoder (#42224), Qwen3.5 (#42151), FlashInfer metadata for Qwen2.5-VL vision attention (#42787).

Engine Core

Model Runner V2: Qwen3-dense-by-default oracle (#39337), sleep-mode reload weights (#42673), update_config (#42783), shared KV-cache layers (#35045), FP32 gumbel sampling (#41775), auto-fallback to MRv1 with connectors (#42955), logprob_token_ids correctness (#43125, #41761), prompt-logprobs size fix (#42778).

KV offloading: multi-tier framework (#40020), Python filesystem secondary tier (#41735), DSv4 support (#43142), tier-offload follow-up (#42529), prefer HND layout (#41928), reset_cache() (#41956), per-request tracking (#42507), store-deferral fix (#41945).

MoE refactor: ExpertMapManager (#41046), experts moved to experts/ (#42334), RoutedExperts alias for FusedMoE (#40735), EPLB refactoring for FusedMoE (#41055).

Mamba: attention module refactor (#41126), Mamba2 SSD kernel warmup (#39822), bf16 SSM cache (#41680), GPU-side state postprocessing fused kernel (#40172), run single-token extends as decodes (#42430).

KV events: emit KV cache metadata (#40984).

Allocator: manual cumem allocator enable (#33648), stream-aware free callback (#43020).

elastic-EP: stage/commit MoE quant method on reconfigure (#40881).

Hardware & Performance

NVIDIA Blackwell / SM12x: FlashInfer b12x MoE + FP4 GEMM for SM120/121 (#40082), per-tensor FP8 CUTLASS on SM12.1 (#41215), head_dim=512 for FlashInfer TRTLLM attention (#38822), FlashInfer Blackwell GDN prefill (#40717), GDN prefill kernel for SM100 (#43273).

Performance: batch-invariant Cutlass FP8 (+28.9% E2E) (#40408), CutlassFP8 padding pre-processing (+13.5% TTFT) (#42651), padded NVFP4 quant kernel (+2.4–5.7% E2E) (#42774), GPU<->CPU sync elimination 1/n (#41429) and 4/n (#42347), fused RoPE+KVCache+q_concat for MLA (#40392), MLA compute_prefill_context / _v_up_proj optimizations (#42460, #42561), penalties Triton kernel (#40657), do_not_specialize in fused FP8 RoPE (#42849), FULL CUDA graph capture for TRITON_MLA decode (#42885).

AMD ROCm: DSV4 functionality + accuracy fixes (#42810, #43679 Tilelang MHC), flash sparse MLA Triton kernels (#41812), gluon paged MQA logits on gfx950/MI355X (#42062), RMSNorm+Quant fusion for gfx950 (#41825), AITER FA backend cleanup (#41942), XGMI backend for MoRI connector (#41753), QuickReduce min-size override (#41675), DSV4 MTP (#43385).

CPU / RISC-V: RVV-optimized attention kernels for RISC-V Vector Extension (#40119) with VLEN=256 (#42943), fused GDN for AMX CPU (#42707), MXFP4 W4A16 MoE (#41922), experimental Triton + MRv2 on CPU (#43225), improved CPU thread utilization (#42666), --cpu-distributed-timeout-seconds (#42968).

Intel XPU: GPTQ int4 support (#37844), mxfp8 MoE (#41918), FP8 block-scaled quantization (#42952), custom-op collective behavior (#41354), multiple sparse-attention kernels (#37888), MoE topk routing + MXFP4 fallback (#42951), CT W4A4 MXFP4 path (#38896), reduced XPU MoE host overhead (#42915).

Kernel ABI: continued migration to libtorch stable ABI — 5/n (#42339), 6/n (#42663), 7/n (#43209).

Experimental: breakable CUDA graph (#42304).

Large Scale Serving

Disaggregated serving (NIXL): lease-renewal TTL for KV blocks on P (#41383), handshake-failure policy honoring (#40364), GDN support for PD with NIXL (#41869), multi-node TP>8 fix (#39907), side-channel host-selection fix (#41806).

Mooncake: disk offloading in MooncakeStoreConnector (#42689), HMA support for DSV4 (#42828), operation metrics (#43392), load-failure propagation (#42788), block-aligned full hits (#43494), finish-after-preemption handling (#43281).

Data parallel: DP Supervisor (#40841), publish request counts at engine-step start (#41626), forward X-data-parallel-rank header (#42330).

EPLB: change default EPLB communicator (#43110), VLM-wrapper init fix (#39805), remove dead torch.accelerator.synchronize() (#40733).

LoRA: one-shot Triton kernel for MoE LoRA (#42290), simultaneous 2D & 3D MoE LoRA adapters (#42242), reduced 2D-weight memory under EP (#42737), MoE LoRA align-kernel grid fix (#40131).

Quantization

MXFP4: linear layers + compressed-tensors integration (#41664), CPU W4A16 MoE (#41922), XPU mxfp8 MoE (#41918).

NVFP4: DeepSeek V4 fused MoE (#42209), ModelOpt W4A16 NVFP4 fused MoE + mixed-precision dispatch (#42566), batch-invariant NVFP4 Cutlass linear (#39912), FlashInfer TRTLLM NvFP4 monolithic MoE routing fix (#43223), TRTLLM NVFP4 MoE chunking fix (#43599).

... (truncated)

Commits

0b3ba88 Revert "[CPU] Experimentally enable Triton and MRV2 (#43225)"
799c3af [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)
64e2523 [Bugfix] Pass routed_scaling_factor to FlashInfer TRTLLM BF16 MoE (#43769)
a147dd0 [ROCm][DSV4] Enable Tilelang MHC replacing torch/triton mhc (#43679)
0759293 [Bugfix][Kernel] TRTLLM NVFP4 MoE chunking (#43599)
a930f5a Fix RunAI streamer tensor buffer reuse during weight loading (#43464)
40cf020 Fix early CUDA init (#43791)
8c40613 [misc] Bump cutedsl version to 4.5.2 (#43745)
5ebdf47 [Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43...
a94cd6d [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the Security Alerts page.

Greptile Summary

This dependabot PR bumps vllm from 0.6.3.post1 to 0.22.0 in the fully-pinned experiments/agentcompany/openhands/requirements.txt. The jump spans ~15 minor vllm versions, and vllm's own release notes call out that v0.20.0 made a PyTorch 2.11 upgrade a breaking environment dependency — this file still pins torch==2.4.0+cu121.

torch==2.4.0+cu121, xformers==0.0.27.post2, and triton==3.0.0 are all built against PyTorch 2.4 and are incompatible with vllm 0.22.0's wheel, which requires PyTorch 2.11.
Several other pinned packages (compressed-tensors==0.6.0, transformers==4.46.2, huggingface-hub==0.23.4) are likely too old to be compatible with a vllm 0.22.0 install and would need coordinated updates.
This PR should be treated as a starting point for a broader dependency audit rather than a drop-in bump.

Confidence Score: 3/5

Not safe to merge as-is; the vllm wheel will be incompatible with the pinned torch, xformers, and triton versions, breaking installation in this environment.

vllm 0.22.0 ships with CUDA kernels compiled against PyTorch 2.11, but the file still pins torch==2.4.0+cu121 and xformers==0.0.27.post2 (built for torch 2.4). This means the environment will either fail to install or crash at import. The PR changes only one line but requires a coordinated multi-package update to be usable.

experiments/agentcompany/openhands/requirements.txt — the torch, xformers, triton, compressed-tensors, and transformers pins all need to be updated alongside the vllm bump.

Important Files Changed

Filename	Overview
experiments/agentcompany/openhands/requirements.txt	Bumps vllm from 0.6.3.post1 to 0.22.0 in a fully-pinned requirements file; the pinned torch==2.4.0+cu121 and xformers==0.0.27.post2 are incompatible with vllm 0.22.0's PyTorch 2.11 requirement, almost certainly breaking installation or runtime.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["vllm==0.22.0\n(requires torch 2.11+)"] -->|depends on| B["torch 2.11+"]
    A -->|depends on| C["xformers compatible\nwith torch 2.11"]
    A -->|depends on| D["triton compatible\nwith torch 2.11"]
    A -->|depends on| E["compressed-tensors\n(new version)"]
    A -->|depends on| F["transformers\n(newer version)"]

    G["requirements.txt pins"] --> H["torch==2.4.0+cu121\n❌ incompatible"]
    G --> I["xformers==0.0.27.post2\n❌ built for torch 2.4"]
    G --> J["triton==3.0.0\n⚠️ likely incompatible"]
    G --> K["compressed-tensors==0.6.0\n⚠️ likely too old"]
    G --> L["transformers==4.46.2\n⚠️ likely too old"]

    H -->|conflicts with| B
    I -->|conflicts with| C
    J -->|may conflict with| D

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
experiments/agentcompany/openhands/requirements.txt:192
**PyTorch version incompatibility with vllm 0.22.0**

`vllm==0.22.0` ships against PyTorch 2.11 (that upgrade landed in vllm v0.20.0 as an explicit breaking change). This file still pins `torch==2.4.0+cu121` (line 177) and `xformers==0.0.27.post2` (line 202), which are built against torch 2.4. Pip will either refuse to install the new vllm wheel due to the torch version mismatch, or the binary CUDA kernels compiled into the vllm wheel will silently crash at import/runtime. All three pins — `torch`, `xformers`, and likely `triton==3.0.0` — need to be updated together to match what vllm 0.22.0 requires. Beyond torch, `compressed-tensors==0.6.0` and `transformers==4.46.2` are also likely too old for a 15-major-version bump in vllm.

_{Reviews (1): Last reviewed commit: "Bump vllm in /experiments/agentcompany/o..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.6.3.post1 to 0.22.0. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md) - [Commits](vllm-project/vllm@v0.6.3.post1...v0.22.0) --- updated-dependencies: - dependency-name: vllm dependency-version: 0.22.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

socket-security · 2026-06-10T17:16:48Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	vllm@0.6.3.post1 ⏵ 0.22.0	^-8	⁺⁷⁵

View full report

socket-security · 2026-06-10T17:16:51Z

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action	Severity	Alert (click "▶" to expand/collapse)
Warn		Potential vulnerability: pypi `vllm` with risk level "medium" Location: Package overview From: experiments/agentcompany/openhands/requirements.txt → `pypi/vllm@0.22.0` ℹ Read more on: This package \| This alert \| Navigating potential vulnerabilities Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `support@socket.dev`. Suggestion: It is advisable to proceed with caution. Engage in a review of the package's security aspects and consider reaching out to the package maintainer for the latest information or patches. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore pypi/vllm@0.22.0`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.
Warn		Potential vulnerability: pypi `vllm` with risk level "medium" Location: Package overview From: experiments/agentcompany/openhands/requirements.txt → `pypi/vllm@0.22.0` ℹ Read more on: This package \| This alert \| Navigating potential vulnerabilities Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `support@socket.dev`. Suggestion: It is advisable to proceed with caution. Engage in a review of the package's security aspects and consider reaching out to the package maintainer for the latest information or patches. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore pypi/vllm@0.22.0`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.
Warn		Potential vulnerability: pypi `vllm` with risk level "medium" Location: Package overview From: experiments/agentcompany/openhands/requirements.txt → `pypi/vllm@0.22.0` ℹ Read more on: This package \| This alert \| Navigating potential vulnerabilities Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `support@socket.dev`. Suggestion: It is advisable to proceed with caution. Engage in a review of the package's security aspects and consider reaching out to the package maintainer for the latest information or patches. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore pypi/vllm@0.22.0`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

greptile-apps · 2026-06-10T17:18:32Z

 uvicorn==0.32.0
 uvloop==0.21.0
-vllm==0.6.3.post1
+vllm==0.22.0


PyTorch version incompatibility with vllm 0.22.0

vllm==0.22.0 ships against PyTorch 2.11 (that upgrade landed in vllm v0.20.0 as an explicit breaking change). This file still pins torch==2.4.0+cu121 (line 177) and xformers==0.0.27.post2 (line 202), which are built against torch 2.4. Pip will either refuse to install the new vllm wheel due to the torch version mismatch, or the binary CUDA kernels compiled into the vllm wheel will silently crash at import/runtime. All three pins — torch, xformers, and likely triton==3.0.0 — need to be updated together to match what vllm 0.22.0 requires. Beyond torch, compressed-tensors==0.6.0 and transformers==4.46.2 are also likely too old for a 15-major-version bump in vllm.

Prompt To Fix With AI

This is a comment left during a code review. Path: experiments/agentcompany/openhands/requirements.txt Line: 192 Comment: **PyTorch version incompatibility with vllm 0.22.0** `vllm==0.22.0` ships against PyTorch 2.11 (that upgrade landed in vllm v0.20.0 as an explicit breaking change). This file still pins `torch==2.4.0+cu121` (line 177) and `xformers==0.0.27.post2` (line 202), which are built against torch 2.4. Pip will either refuse to install the new vllm wheel due to the torch version mismatch, or the binary CUDA kernels compiled into the vllm wheel will silently crash at import/runtime. All three pins — `torch`, `xformers`, and likely `triton==3.0.0` — need to be updated together to match what vllm 0.22.0 requires. Beyond torch, `compressed-tensors==0.6.0` and `transformers==4.46.2` are also likely too old for a 15-major-version bump in vllm. How can I resolve this? If you propose a fix, please make it concise.

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jun 10, 2026

dependabot Bot mentioned this pull request Jun 10, 2026

Bump vllm from 0.6.3.post1 to 0.20.0 in /experiments/agentcompany/openhands #24

Closed

greptile-apps Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump vllm from 0.6.3.post1 to 0.22.0 in /experiments/agentcompany/openhands#34

Bump vllm from 0.6.3.post1 to 0.22.0 in /experiments/agentcompany/openhands#34
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/experiments/agentcompany/openhands/vllm-0.22.0

dependabot Bot commented on behalf of github Jun 10, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

socket-security Bot commented Jun 10, 2026

Uh oh!

socket-security Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github Jun 10, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v0.22.0

Highlights

Model Support

Engine Core

Hardware & Performance

Large Scale Serving

Quantization

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

socket-security Bot commented Jun 10, 2026

Uh oh!

socket-security Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

dependabot Bot commented on behalf of github Jun 10, 2026 •

edited by greptile-apps Bot

Loading