Skip to content

Replace uv run python with sys.executable in eval scripts#472

Merged
simonrosenberg merged 1 commit intomainfrom
refactor/sys-executable
Mar 2, 2026
Merged

Replace uv run python with sys.executable in eval scripts#472
simonrosenberg merged 1 commit intomainfrom
refactor/sys-executable

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

  • Replaced uv run python subprocess invocations with sys.executable in swebench/eval_infer.py, swtbench/eval_infer.py, and multiswebench/eval_infer.py
  • Removes the hard dependency on uv being available at eval time
  • Uses the correct Python interpreter regardless of environment (uv, NeMo, etc.)

Extracted from #455.

Changes

File Change
benchmarks/swebench/eval_infer.py ["uv", "run", "python", "-m", ...][sys.executable, "-m", ...]
benchmarks/swtbench/eval_infer.py Removed 16-line subprocess block that ran uv run python -c "import sys; print(sys.executable)" just to discover the Python path; replaced with python_executable = sys.executable
benchmarks/multiswebench/eval_infer.py ["uv", "run", "python", "-m", ...][sys.executable, "-m", ...]; added import sys

Validation

All affected benchmarks pass CI with eval_limit=1 (from #455 validation):

Benchmark Status Run
swebench https://github.com/OpenHands/evaluation/actions/runs/22590769394
swtbench https://github.com/OpenHands/evaluation/actions/runs/22590775404
multiswebench Dataset schema issue (unrelated — see #304)

Test plan

  • Verify swebench-eval runs correctly without uv on PATH
  • Verify swtbench-eval runs correctly without uv on PATH
  • Verify multi-swebench-eval runs correctly without uv on PATH

🤖 Generated with Claude Code

Simpler, removes the hard dependency on uv being available at eval time,
and uses the correct Python interpreter in both uv and NeMo environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste

This is beautiful simplification. Replacing 16 lines of subprocess gymnastics with sys.executable is exactly right. The old swtbench code was running a subprocess just to discover what Python interpreter to use - which is precisely what sys.executable gives you directly.

The assumption that the Python running the script has the required dependencies is standard practice. CI validates this works.

Verdict: ✅ Ship it

@simonrosenberg simonrosenberg merged commit d4a464b into main Mar 2, 2026
3 checks passed
simonrosenberg added a commit that referenced this pull request Mar 3, 2026
Resolve conflicts after sub-PRs #471, #472, #473 were merged to main:
- Take main's SDK_SHORT_SHA deprecation handling in version.py
- Take main's backward-compat SDK_SHORT_SHA in modal_patches.py
- Take main's log message wording in swebench/run_infer.py
- Remove redundant prompt_dir setup (superseded by add_prompt_path_argument)
- Keep lazy imports for git-dependent modules (build_utils.py, swtbench/run_infer.py)
- Take main's comment cleanup in swtbench/eval_infer.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants