Port RLM SWE taskset to v1#1440
Draft
rasdani wants to merge 1 commit into
Draft
Conversation
| sandbox_config = task.get("sandbox") | ||
| if isinstance(sandbox_config, Mapping): | ||
| sandbox_data = dict(sandbox_config) | ||
| timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60) |
There was a problem hiding this comment.
🟢 Low tasksets/swe.py:210
timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60) silently replaces an explicit 0 with 60, so a user who sets timeout_minutes: 0 (e.g., to disable timeout) gets 60 minutes instead. Consider using sandbox_data.get("timeout_minutes", 60) to preserve 0 as a valid value.
- timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)
+ timeout_minutes = int(sandbox_data.get("timeout_minutes", 60))🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/packages/tasksets/swe.py around line 210:
`timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)` silently replaces an explicit `0` with `60`, so a user who sets `timeout_minutes: 0` (e.g., to disable timeout) gets 60 minutes instead. Consider using `sandbox_data.get("timeout_minutes", 60)` to preserve `0` as a valid value.
Evidence trail:
verifiers/v1/packages/tasksets/swe.py line 210 at REVIEWED_COMMIT: `timeout_minutes = int(sandbox_data.get("timeout_minutes") or 60)` — Python `or` treats `0` as falsy, so `0 or 60` evaluates to `60`.
Comment on lines
+30
to
+31
| filepath.write_text(content.replace(old_str, new_str, 1)) | ||
| return f"Edited {path}" |
There was a problem hiding this comment.
🟢 Low edit/edit.py:30
The write_text() call on line 30 is unguarded, so write failures (permission denied, disk full, read-only filesystem) raise unhandled exceptions instead of returning error message strings like the rest of the function. This breaks the established error-handling pattern where all I/O errors return formatted error strings.
- filepath.write_text(content.replace(old_str, new_str, 1))
- return f"Edited {path}"
+ try:
+ filepath.write_text(content.replace(old_str, new_str, 1))
+ except Exception as exc:
+ return f"Error writing {path}: {exc}"
+ return f"Edited {path}"🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py around lines 30-31:
The `write_text()` call on line 30 is unguarded, so write failures (permission denied, disk full, read-only filesystem) raise unhandled exceptions instead of returning error message strings like the rest of the function. This breaks the established error-handling pattern where all I/O errors return formatted error strings.
Evidence trail:
File: verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.py (lines 1-31 at REVIEWED_COMMIT). Line 19-22: read_text() wrapped in try/except returning error string. Line 30: write_text() has no try/except guard. The function's pattern (lines 17-18, 19-22, 24-28) is to return error strings for all failure cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SWETasksetadapter over the existing composable SWE taskset backendsrlm_swe_v1to composevf.SWETasksetwith the existing v1vf.RLMharnessSWETaskset/SWETasksetConfigfrom the v1 and top-level APIseditskill package and update focused tests/docsNotes
This keeps the existing SWE backend implementations in place and adapts them into the v1 taskset/runtime shape. It does not run a live
prime eval run; that should be the final smoke before publishing broader env changes.Validation
uv run pre-commit installuv run ruff check --fix verifiers/v1/packages/tasksets/swe.py verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.pyuv run pytest tests/test_v1_rlm_swe.pyuv run pre-commit run semgrep-v1-policy --files verifiers/v1/packages/tasksets/swe.py verifiers/v1/packages/tasksets/skills/edit/src/edit/edit.pyuv run pre-commit run --hook-stage pre-push ty --files verifiers/v1/packages/tasksets/swe.pyNote
Port RLM SWE taskset to use shared
vf.SWETasksetadapterR2ESWETasksetand all r2e-specific helpers inrlm_swe_v1.pywith the newvf.SWETasksetadapter.SWETasksetbridges legacy SWE backends into the v1 runtime: it constructs the backend viamake_swe_taskset, produces v1 task rows, maps sandbox specs, adapts reward signals, and registerssetup_swe_sandbox/cleanup_swe_statehandlers.SWETasksetConfigsupports multiple backends (r2e,swebench,multiswe,openswe, and others) via a unifiedtask_typefield, replacing the old r2e-only config.editskill package underverifiers/v1/packages/tasksets/skills/edit/for safe single-occurrence string replacement in files.SWETasksetandSWETasksetConfigare exported from the top-levelverifierspackage.setup_r2e_sandbox/cleanup_r2e_statetosetup_swe_sandbox/cleanup_swe_state.📊 Macroscope summarized 28edca4. 10 files reviewed, 3 issues evaluated, 0 issues filtered, 2 comments posted
🗂️ Filtered Issues