Skip to content

Port RLM SWE taskset to v1#1440

Draft
rasdani wants to merge 1 commit into
mainfrom
codex/v1-harnesses-main
Draft

Port RLM SWE taskset to v1#1440
rasdani wants to merge 1 commit into
mainfrom
codex/v1-harnesses-main

Conversation

@rasdani

@rasdani rasdani commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add a reusable v1 SWETaskset adapter over the existing composable SWE taskset backends
  • rewire rlm_swe_v1 to compose vf.SWETaskset with the existing v1 vf.RLM harness
  • expose SWETaskset / SWETasksetConfig from the v1 and top-level APIs
  • add the RLM SWE edit skill package and update focused tests/docs

Notes

This keeps the existing SWE backend implementations in place and adapts them into the v1 taskset/runtime shape. It does not run a live prime eval run; that should be the final smoke before publishing broader env changes.

Validation

  • uv run pre-commit install
  • uv run ruff check --fix verifiers/v1/packages/tasksets/swe.py verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py
  • uv run pytest tests/test_v1_rlm_swe.py
  • uv run pre-commit run semgrep-v1-policy --files verifiers/v1/packages/tasksets/swe.py verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py
  • uv run pre-commit run --hook-stage pre-push ty --files verifiers/v1/packages/tasksets/swe.py
  • pre-commit hooks on commit
  • pre-push hooks on push

Note

Port RLM SWE taskset to use shared vf.SWETaskset adapter

  • Replaces the environment-local R2ESWETaskset and all r2e-specific helpers in rlm_swe_v1.py with the new vf.SWETaskset adapter.
  • SWETaskset bridges legacy SWE backends into the v1 runtime: it constructs the backend via make_swe_taskset, produces v1 task rows, maps sandbox specs, adapts reward signals, and registers setup_swe_sandbox/cleanup_swe_state handlers.
  • SWETasksetConfig supports multiple backends (r2e, swebench, multiswe, openswe, and others) via a unified task_type field, replacing the old r2e-only config.
  • Adds a new edit skill package under verifiers/v1/packages/tasksets/skills/edit/ for safe single-occurrence string replacement in files.
  • SWETaskset and SWETasksetConfig are exported from the top-level verifiers package.
  • Behavioral Change: setup/cleanup handler names changed from setup_r2e_sandbox/cleanup_r2e_state to setup_swe_sandbox/cleanup_swe_state.
📊 Macroscope summarized 28edca4. 10 files reviewed, 3 issues evaluated, 0 issues filtered, 2 comments posted

🗂️ Filtered Issues

sandbox_config = task.get("sandbox")
if isinstance(sandbox_config, Mapping):
sandbox_data = dict(sandbox_config)
timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low tasksets/swe.py:210

timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60) silently replaces an explicit 0 with 60, so a user who sets timeout_minutes: 0 (e.g., to disable timeout) gets 60 minutes instead. Consider using sandbox_data.get("timeout_minutes", 60) to preserve 0 as a valid value.

-            timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)
+            timeout_minutes = int(sandbox_data.get("timeout_minutes", 60))
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/packages/tasksets/swe.py around line 210:

`timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)` silently replaces an explicit `0` with `60`, so a user who sets `timeout_minutes: 0` (e.g., to disable timeout) gets 60 minutes instead. Consider using `sandbox_data.get("timeout_minutes", 60)` to preserve `0` as a valid value.

Evidence trail:
verifiers/v1/packages/tasksets/swe.py line 210 at REVIEWED_COMMIT: `timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)` — Python `or` treats `0` as falsy, so `0 or 60` evaluates to `60`.

Comment on lines +30 to +31
filepath.write_text(content.replace(old_str, new_str, 1))
return f"Edited {path}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low edit/edit.py:30

The write_text() call on line 30 is unguarded, so write failures (permission denied, disk full, read-only filesystem) raise unhandled exceptions instead of returning error message strings like the rest of the function. This breaks the established error-handling pattern where all I/O errors return formatted error strings.

-    filepath.write_text(content.replace(old_str, new_str, 1))
-    return f"Edited {path}"
+    try:
+        filepath.write_text(content.replace(old_str, new_str, 1))
+    except Exception as exc:
+        return f"Error writing {path}: {exc}"
+    return f"Edited {path}"
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py around lines 30-31:

The `write_text()` call on line 30 is unguarded, so write failures (permission denied, disk full, read-only filesystem) raise unhandled exceptions instead of returning error message strings like the rest of the function. This breaks the established error-handling pattern where all I/O errors return formatted error strings.

Evidence trail:
File: verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py (lines 1-31 at REVIEWED_COMMIT). Line 19-22: read_text() wrapped in try/except returning error string. Line 30: write_text() has no try/except guard. The function's pattern (lines 17-18, 19-22, 24-28) is to return error strings for all failure cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant