Replace `uv run python` with `sys.executable` in eval scripts by simonrosenberg · Pull Request #472 · OpenHands/benchmarks

simonrosenberg · 2026-03-02T20:24:40Z

Summary

Replaced uv run python subprocess invocations with sys.executable in swebench/eval_infer.py, swtbench/eval_infer.py, and multiswebench/eval_infer.py
Removes the hard dependency on uv being available at eval time
Uses the correct Python interpreter regardless of environment (uv, NeMo, etc.)

Extracted from #455.

Changes

File	Change
`benchmarks/swebench/eval_infer.py`	`["uv", "run", "python", "-m", ...]` → `[sys.executable, "-m", ...]`
`benchmarks/swtbench/eval_infer.py`	Removed 16-line subprocess block that ran `uv run python -c "import sys; print(sys.executable)"` just to discover the Python path; replaced with `python_executable = sys.executable`
`benchmarks/multiswebench/eval_infer.py`	`["uv", "run", "python", "-m", ...]` → `[sys.executable, "-m", ...]`; added `import sys`

Validation

All affected benchmarks pass CI with eval_limit=1 (from #455 validation):

Benchmark	Status	Run
swebench	✅	https://github.com/OpenHands/evaluation/actions/runs/22590769394
swtbench	✅	https://github.com/OpenHands/evaluation/actions/runs/22590775404
multiswebench	❌	Dataset schema issue (unrelated — see #304)

Test plan

Verify swebench-eval runs correctly without uv on PATH
Verify swtbench-eval runs correctly without uv on PATH
Verify multi-swebench-eval runs correctly without uv on PATH

🤖 Generated with Claude Code

Simpler, removes the hard dependency on uv being available at eval time, and uses the correct Python interpreter in both uv and NeMo environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

all-hands-bot

Taste Rating: 🟢 Good taste

This is beautiful simplification. Replacing 16 lines of subprocess gymnastics with sys.executable is exactly right. The old swtbench code was running a subprocess just to discover what Python interpreter to use - which is precisely what sys.executable gives you directly.

The assumption that the Python running the script has the required dependencies is standard practice. CI validates this works.

Verdict: ✅ Ship it

Resolve conflicts after sub-PRs #471, #472, #473 were merged to main: - Take main's SDK_SHORT_SHA deprecation handling in version.py - Take main's backward-compat SDK_SHORT_SHA in modal_patches.py - Take main's log message wording in swebench/run_infer.py - Remove redundant prompt_dir setup (superseded by add_prompt_path_argument) - Keep lazy imports for git-dependent modules (build_utils.py, swtbench/run_infer.py) - Take main's comment cleanup in swtbench/eval_infer.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace uv run python with sys.executable in eval scripts

185fdfa

Simpler, removes the hard dependency on uv being available at eval time, and uses the correct Python interpreter in both uv and NeMo environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

all-hands-bot approved these changes Mar 2, 2026

View reviewed changes

simonrosenberg merged commit d4a464b into main Mar 2, 2026
3 checks passed

simonrosenberg mentioned this pull request Mar 3, 2026

NeMo Evaluator Integration #455

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `uv run python` with `sys.executable` in eval scripts#472

Replace `uv run python` with `sys.executable` in eval scripts#472
simonrosenberg merged 1 commit intomainfrom
refactor/sys-executable

simonrosenberg commented Mar 2, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonrosenberg commented Mar 2, 2026

Summary

Changes

Validation

Test plan

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants