Skip to content

Vllm 0.20#49

Merged
JuhaoLiang1997 merged 3 commits into
mainfrom
vllm-0.20
May 19, 2026
Merged

Vllm 0.20#49
JuhaoLiang1997 merged 3 commits into
mainfrom
vllm-0.20

Conversation

@JuhaoLiang1997
Copy link
Copy Markdown
Collaborator

Summary

Type of change

  • New platform support
  • Bug fix (runner, validator, leaderboard, or tooling)
  • Suite definition change
  • Schema change
  • Leaderboard / UI improvement
  • Documentation
  • Other:

Testing

# Commands used to verify

Checklist

  • I have read CONTRIBUTING.md
  • My change does not break existing result.json files (or I have explained the migration path)
  • If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
  • If changing the schema: validate_submission.py updated and all existing results still validate
  • If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
  • I have updated relevant documentation

Related issues

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

JuhaoLiang1997 and others added 3 commits May 19, 2026 00:53
Adds the AccelMark runner for the upcoming vLLM 0.20.x major release on
NVIDIA GPUs. This is the supersessor of nvidia_vllm_47f5d58e (vLLM
0.7.3) and tracks the 2026 stack: torch 2.11, CUDA 13.0 (12.8 still
supported), HuggingFace Transformers v5, FlashAttention 4 MLA prefill,
Model Runner V2, and TurboQuant 2-bit KV cache.

What is included:

* runners/nvidia_vllm_0c1710bd/ — runner.py, meta.json (with
  supersedes_chain pointing at the 0.7.3 runner and suite_support
  self-declaration), requirements.txt, README.md
* configs/runner_configs/runner_nvidia_vllm_0c1710bd.yaml.example

The README platforms matrix updates automatically from meta.json — no
shared file is touched. The 0.7.3 runner remains in the matrix until
a successful smoke result on 0.20 lands, at which point its meta.json
will gain deprecated_by in a follow-up PR.

Capability flags:

* SUPPORTED_QUANTIZATION_BACKENDS adds 'turboquant' on top of fp8 /
  compressed-tensors / gptq_marlin.
* Framework version string now reports vllm + transformers together so
  v4 vs v5 transformers transitions are visible in result.json.
* Reuses EngineArgs-field filtering to absorb new 0.20 engine kwargs
  without breaking old configs.

Initial commit, not yet validated on hardware; all suites are marked
"pending" in suite_support.

Co-authored-by: Cursor <cursoragent@cursor.com>
@JuhaoLiang1997 JuhaoLiang1997 merged commit ad1a270 into main May 19, 2026
3 checks passed
@JuhaoLiang1997 JuhaoLiang1997 deleted the vllm-0.20 branch May 19, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant