Port RLM SWE taskset to v1 by rasdani · Pull Request #1440 · PrimeIntellect-ai/verifiers

rasdani · 2026-05-22T17:57:51Z

Summary

add a reusable v1 SWETaskset adapter over the existing composable SWE taskset backends
rewire rlm_swe_v1 to compose vf.SWETaskset with the existing v1 vf.RLM harness
expose SWETaskset / SWETasksetConfig from the v1 and top-level APIs
add the RLM SWE edit skill package and update focused tests/docs

Notes

This keeps the existing SWE backend implementations in place and adapts them into the v1 taskset/runtime shape. It does not run a live prime eval run; that should be the final smoke before publishing broader env changes.

Validation

uv run pre-commit install
uv run ruff check --fix verifiers/v1/packages/tasksets/swe.py verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py
uv run pytest tests/test_v1_rlm_swe.py
uv run pre-commit run semgrep-v1-policy --files verifiers/v1/packages/tasksets/swe.py verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py
uv run pre-commit run --hook-stage pre-push ty --files verifiers/v1/packages/tasksets/swe.py
pre-commit hooks on commit
pre-push hooks on push

Note

Port RLM SWE taskset to use shared `vf.SWETaskset` adapter

Replaces the environment-local R2ESWETaskset and all r2e-specific helpers in rlm_swe_v1.py with the new vf.SWETaskset adapter.
SWETaskset bridges legacy SWE backends into the v1 runtime: it constructs the backend via make_swe_taskset, produces v1 task rows, maps sandbox specs, adapts reward signals, and registers setup_swe_sandbox/cleanup_swe_state handlers.
SWETasksetConfig supports multiple backends (r2e, swebench, multiswe, openswe, and others) via a unified task_type field, replacing the old r2e-only config.
Adds a new edit skill package under verifiers/v1/packages/tasksets/skills/edit/ for safe single-occurrence string replacement in files.
SWETaskset and SWETasksetConfig are exported from the top-level verifiers package.
Behavioral Change: setup/cleanup handler names changed from setup_r2e_sandbox/cleanup_r2e_state to setup_swe_sandbox/cleanup_swe_state.

📊 Macroscope summarized 28edca4. 10 files reviewed, 3 issues evaluated, 0 issues filtered, 2 comments posted

🗂️ Filtered Issues

macroscopeapp · 2026-05-22T18:00:38Z

+        sandbox_config = task.get("sandbox")
+        if isinstance(sandbox_config, Mapping):
+            sandbox_data = dict(sandbox_config)
+            timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)


🟢 Low tasksets/swe.py:210

timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60) silently replaces an explicit 0 with 60, so a user who sets timeout_minutes: 0 (e.g., to disable timeout) gets 60 minutes instead. Consider using sandbox_data.get("timeout_minutes", 60) to preserve 0 as a valid value.

- timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60) + timeout_minutes = int(sandbox_data.get("timeout_minutes", 60))

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file verifiers/v1/packages/tasksets/swe.py around line 210: `timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)` silently replaces an explicit `0` with `60`, so a user who sets `timeout_minutes: 0` (e.g., to disable timeout) gets 60 minutes instead. Consider using `sandbox_data.get("timeout_minutes", 60)` to preserve `0` as a valid value. Evidence trail: verifiers/v1/packages/tasksets/swe.py line 210 at REVIEWED_COMMIT: `timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)` — Python `or` treats `0` as falsy, so `0 or 60` evaluates to `60`.

macroscopeapp · 2026-05-22T18:00:38Z

+    filepath.write_text(content.replace(old_str, new_str, 1))
+    return f"Edited {path}"


🟢 Low edit/edit.py:30

The write_text() call on line 30 is unguarded, so write failures (permission denied, disk full, read-only filesystem) raise unhandled exceptions instead of returning error message strings like the rest of the function. This breaks the established error-handling pattern where all I/O errors return formatted error strings.

- filepath.write_text(content.replace(old_str, new_str, 1)) - return f"Edited {path}" + try: + filepath.write_text(content.replace(old_str, new_str, 1)) + except Exception as exc: + return f"Error writing {path}: {exc}" + return f"Edited {path}"

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py around lines 30-31: The `write_text()` call on line 30 is unguarded, so write failures (permission denied, disk full, read-only filesystem) raise unhandled exceptions instead of returning error message strings like the rest of the function. This breaks the established error-handling pattern where all I/O errors return formatted error strings. Evidence trail: File: verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py (lines 1-31 at REVIEWED_COMMIT). Line 19-22: read_text() wrapped in try/except returning error string. Line 30: write_text() has no try/except guard. The function's pattern (lines 17-18, 19-22, 24-28) is to return error strings for all failure cases.

Port RLM SWE taskset to v1

28edca4

macroscopeapp Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port RLM SWE taskset to v1#1440

Port RLM SWE taskset to v1#1440
rasdani wants to merge 1 commit into
mainfrom
codex/v1-harnesses-main

rasdani commented May 22, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot May 22, 2026

Uh oh!

macroscopeapp Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		filepath.write_text(content.replace(old_str, new_str, 1))
		return f"Edited {path}"

Conversation

rasdani commented May 22, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Validation

Port RLM SWE taskset to use shared vf.SWETaskset adapter

🗂️ Filtered Issues

Uh oh!

macroscopeapp Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rasdani commented May 22, 2026 •

edited by macroscopeapp Bot

Loading

Port RLM SWE taskset to use shared `vf.SWETaskset` adapter