-
Notifications
You must be signed in to change notification settings - Fork 238
feat: add execute_tool API endpoint for direct tool execution on conversations #2356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
xingyaoww
wants to merge
7
commits into
main
Choose a base branch
from
feat/execute-tool-api
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
24cf245
feat: add execute_tool API endpoint for direct tool execution on conv…
openhands-agent 045262b
fix: resolve CI failures in execute_tool PR
openhands-agent 874d2cd
Merge branch 'main' into feat/execute-tool-api
xingyaoww 61c96d0
Merge branch 'main' into feat/execute-tool-api
openhands-agent d6c5a65
fix: address PR review feedback on execute_tool endpoint
openhands-agent a59b48d
Merge branch 'main' into feat/execute-tool-api
xingyaoww 1df7714
feat: add execute_tool example for agent server
openhands-agent File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| # PR #2356 Review: execute_tool API endpoint | ||
|
|
||
| ## Summary | ||
|
|
||
| This PR adds a `POST /api/conversations/{id}/execute_tool` endpoint that allows | ||
| executing tools (like the terminal) directly on a conversation without going | ||
| through the agent loop. This is primarily designed for `setup.sh` execution where | ||
| environment changes need to persist in the agent's terminal session. | ||
|
|
||
| ## CI Fixes Applied | ||
|
|
||
| ### 1. Pyright type errors in `event_service.py` (pre-commit) | ||
| **Problem**: `self._conversation` is typed `LocalConversation | None`. Pyright | ||
| cannot narrow the type inside a nested closure after the None-check. | ||
|
|
||
| **Fix**: Captured `self._conversation` in a local variable `conversation` before | ||
| defining the `_execute()` closure. | ||
|
|
||
| ### 2. `Observation` ABC instantiation in `remote_conversation.py` (sdk-tests) | ||
| **Problem**: `Observation` is an abstract base class (ABC) and cannot be | ||
| instantiated directly. `BaseObservation.from_text()` raised a validation error. | ||
|
|
||
| **Fix**: Created a private concrete `_RemoteObservation` subclass inside | ||
| `execute_tool()` to hold the response text. | ||
|
|
||
| ### 3. Stale test `test_remote_conversation_execute_tool_not_implemented` | ||
| **Problem**: The test expected `NotImplementedError` but the PR now implements | ||
| the method. | ||
|
|
||
| **Fix**: Replaced with `test_remote_conversation_execute_tool` that mocks the | ||
| API response and verifies the observation is correctly parsed. | ||
|
|
||
| ### 4. Line-too-long issues (E501) | ||
| Fixed in `event_service.py` docstring and `models.py` field description. | ||
|
|
||
| ## Functional Test Results | ||
|
|
||
| Started the agent server and tested the endpoint with: | ||
|
|
||
| ### ✅ Successful tool execution | ||
| ``` | ||
| POST /api/conversations/{id}/execute_tool | ||
| {"tool_name": "terminal", "action": {"command": "echo hello world && pwd"}} | ||
|
|
||
| Response: {"observation": {"content": [{"text": "hello world\n/tmp/test_workspace"}], ...}, "is_error": false} | ||
| ``` | ||
|
|
||
| ### ✅ Environment variable persistence (core use case) | ||
| ``` | ||
| # Set variable | ||
| {"tool_name": "terminal", "action": {"command": "export MY_SETUP_VAR=setup_complete"}} | ||
| # → success | ||
|
|
||
| # Read variable in subsequent call | ||
| {"tool_name": "terminal", "action": {"command": "echo $MY_SETUP_VAR"}} | ||
| # → "setup_complete" ← Variable persists across calls! | ||
| ``` | ||
|
|
||
| ### ✅ Error handling: nonexistent tool | ||
| ``` | ||
| {"tool_name": "nonexistent_tool", "action": {"command": "echo test"}} | ||
| # → 400: "Tool 'nonexistent_tool' not found. Available tools: ['terminal', 'finish', 'think']" | ||
| ``` | ||
|
|
||
| ### ✅ Error handling: nonexistent conversation | ||
| ``` | ||
| POST /api/conversations/00000000-0000-0000-0000-000000000000/execute_tool | ||
| # → 404: "Not Found" | ||
| ``` | ||
|
|
||
| ## Code Review Notes | ||
|
|
||
| ### Architecture | ||
| The layered approach is clean and consistent with existing patterns: | ||
| - **Router** → **ConversationService** → **EventService** → **LocalConversation** | ||
| - `_ensure_agent_ready()` handles lazy initialization of tools | ||
|
|
||
| ### Minor observation | ||
| In `conversation_router.py`, the `except KeyError` followed by `except Exception` | ||
| is technically redundant (Exception already catches KeyError), but it serves as | ||
| documentation of expected error types and is harmless. | ||
|
|
||
| ### Overall Assessment | ||
| The PR is well-structured and solves a real problem. The key insight — | ||
| executing through the agent's persistent terminal session rather than ephemeral | ||
| subprocesses — correctly addresses the `setup.sh` environment persistence issue. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,230 @@ | ||
| """Execute tools directly on a conversation via the agent server. | ||
|
|
||
| Demonstrates ``RemoteConversation.execute_tool()`` which calls the server's | ||
| ``POST /api/conversations/{id}/execute_tool`` endpoint. This lets you run | ||
| tools (like the terminal) on a conversation **without** going through the | ||
| agent loop — useful for pre-run setup where environment changes must persist | ||
| in the agent's session (e.g. running ``.openhands/setup.sh``). | ||
|
|
||
| The example shows two approaches: | ||
| 1. **SDK** — ``RemoteConversation.execute_tool(action)`` | ||
| 2. **Raw HTTP** — ``POST /api/conversations/{id}/execute_tool`` via httpx | ||
|
|
||
| Both produce the same result. Approach 1 is the recommended way when you | ||
| already have a ``RemoteConversation`` object; approach 2 is useful when | ||
| integrating from a language or service that doesn't use the SDK. | ||
| """ | ||
|
|
||
| import os | ||
| import subprocess | ||
| import sys | ||
| import tempfile | ||
| import threading | ||
| import time | ||
|
|
||
| import httpx | ||
| from pydantic import SecretStr | ||
|
|
||
| from openhands.sdk import LLM, Agent, Conversation, RemoteConversation, Tool, Workspace | ||
| from openhands.tools.terminal import TerminalTool | ||
| from openhands.tools.terminal.definition import TerminalAction | ||
|
|
||
|
|
||
| # ----------------------------------------------------------------- | ||
| # Managed server helper (reused from example 01) | ||
| # ----------------------------------------------------------------- | ||
| def _stream_output(stream, prefix, target_stream): | ||
| try: | ||
| for line in iter(stream.readline, ""): | ||
| if line: | ||
| target_stream.write(f"[{prefix}] {line}") | ||
| target_stream.flush() | ||
| except Exception as e: | ||
| print(f"Error streaming {prefix}: {e}", file=sys.stderr) | ||
| finally: | ||
| stream.close() | ||
|
|
||
|
|
||
| class ManagedAPIServer: | ||
| """Context manager that starts and stops a local agent-server.""" | ||
|
|
||
| def __init__(self, port: int = 8000, host: str = "127.0.0.1"): | ||
| self.port = port | ||
| self.host = host | ||
| self.process: subprocess.Popen[str] | None = None | ||
| self.base_url = f"http://{host}:{port}" | ||
|
|
||
| def __enter__(self): | ||
| print(f"Starting agent-server on {self.base_url} ...") | ||
| self.process = subprocess.Popen( | ||
| [ | ||
| "python", | ||
| "-m", | ||
| "openhands.agent_server", | ||
| "--port", | ||
| str(self.port), | ||
| "--host", | ||
| self.host, | ||
| ], | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.PIPE, | ||
| text=True, | ||
| env={"LOG_JSON": "true", **os.environ}, | ||
| ) | ||
| assert self.process.stdout is not None | ||
| assert self.process.stderr is not None | ||
| threading.Thread( | ||
| target=_stream_output, | ||
| args=(self.process.stdout, "SERVER", sys.stdout), | ||
| daemon=True, | ||
| ).start() | ||
| threading.Thread( | ||
| target=_stream_output, | ||
| args=(self.process.stderr, "SERVER", sys.stderr), | ||
| daemon=True, | ||
| ).start() | ||
|
|
||
| for _ in range(30): | ||
| try: | ||
| r = httpx.get(f"{self.base_url}/health", timeout=1.0) | ||
| if r.status_code == 200: | ||
| print(f"Agent-server ready at {self.base_url}") | ||
| return self | ||
| except Exception: | ||
| pass | ||
| assert self.process.poll() is None, "Server exited unexpectedly" | ||
| time.sleep(1) | ||
| raise RuntimeError("Server failed to start in 30 s") | ||
|
|
||
| def __exit__(self, *args): | ||
| if self.process: | ||
| self.process.terminate() | ||
| try: | ||
| self.process.wait(timeout=5) | ||
| except subprocess.TimeoutExpired: | ||
| self.process.kill() | ||
| self.process.wait() | ||
| time.sleep(0.5) | ||
| print("Agent-server stopped.") | ||
|
|
||
|
|
||
| # ----------------------------------------------------------------- | ||
| # Config | ||
| # ----------------------------------------------------------------- | ||
| api_key = os.getenv("LLM_API_KEY") | ||
| assert api_key, "LLM_API_KEY must be set" | ||
|
|
||
| llm = LLM( | ||
| model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), | ||
| api_key=SecretStr(api_key), | ||
| base_url=os.getenv("LLM_BASE_URL"), | ||
| ) | ||
| agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) | ||
|
|
||
| # ----------------------------------------------------------------- | ||
| # Run | ||
| # ----------------------------------------------------------------- | ||
| with ManagedAPIServer(port=8003) as server: | ||
| workspace_dir = tempfile.mkdtemp(prefix="execute_tool_demo_") | ||
| workspace = Workspace(host=server.base_url, working_dir=workspace_dir) | ||
|
|
||
| conversation = Conversation(agent=agent, workspace=workspace) | ||
| assert isinstance(conversation, RemoteConversation) | ||
|
|
||
| print("=" * 64) | ||
| print(" execute_tool — Direct Tool Execution on Agent Server") | ||
| print("=" * 64) | ||
| print(f"\nConversation ID: {conversation.id}") | ||
|
|
||
| # ============================================================= | ||
| # Approach 1: SDK — RemoteConversation.execute_tool() | ||
| # ============================================================= | ||
| print("\n--- Approach 1: SDK (RemoteConversation.execute_tool) ---") | ||
|
|
||
| # Run a command that sets an environment variable | ||
| obs1 = conversation.execute_tool( | ||
| "terminal", | ||
| TerminalAction(command="export MY_SETUP_VAR='hello from setup'"), | ||
| ) | ||
| print(f"Set env var → is_error={obs1.is_error}") | ||
|
|
||
| # Verify the variable persists in the same terminal session | ||
| obs2 = conversation.execute_tool( | ||
| "terminal", | ||
| TerminalAction(command="echo $MY_SETUP_VAR"), | ||
| ) | ||
| print(f"Read env var → text={obs2.text!r}") | ||
| assert "hello from setup" in obs2.text, ( | ||
| f"Environment variable did not persist! Got: {obs2.text!r}" | ||
| ) | ||
| print("✅ Environment variable persisted across execute_tool calls!") | ||
|
|
||
| # ============================================================= | ||
| # Approach 2: Raw HTTP — POST /api/conversations/{id}/execute_tool | ||
| # ============================================================= | ||
| print("\n--- Approach 2: Raw HTTP (httpx) ---") | ||
|
|
||
| conv_id = str(conversation.id) | ||
| url = f"{server.base_url}/api/conversations/{conv_id}/execute_tool" | ||
|
|
||
| # Set another variable via raw HTTP | ||
| resp = httpx.post( | ||
| url, | ||
| json={ | ||
| "tool_name": "terminal", | ||
| "action": { | ||
| "command": "export HTTP_VAR='set via raw HTTP'", | ||
| }, | ||
| }, | ||
| timeout=30.0, | ||
| ) | ||
| resp.raise_for_status() | ||
| result = resp.json() | ||
| print(f"Set env var → is_error={result['is_error']}") | ||
|
|
||
| # Read both variables to show everything shares the same session | ||
| resp = httpx.post( | ||
| url, | ||
| json={ | ||
| "tool_name": "terminal", | ||
| "action": { | ||
| "command": "echo SDK=$MY_SETUP_VAR HTTP=$HTTP_VAR", | ||
| }, | ||
| }, | ||
| timeout=30.0, | ||
| ) | ||
| resp.raise_for_status() | ||
| result = resp.json() | ||
|
|
||
| # Extract text from observation content | ||
| obs_data = result["observation"] | ||
| content = obs_data.get("content", []) | ||
| text = content[0]["text"] if content else "" | ||
| print(f"Read both → text={text!r}") | ||
|
|
||
| assert "hello from setup" in text, f"SDK var missing! Got: {text!r}" | ||
| assert "set via raw HTTP" in text, f"HTTP var missing! Got: {text!r}" | ||
| print("✅ Both variables persisted — same terminal session!") | ||
|
|
||
| # ============================================================= | ||
| # Bonus: the agent loop sees these changes too | ||
| # ============================================================= | ||
| print("\n--- Verify agent session shares the environment ---") | ||
| conversation.send_message( | ||
| "Run `echo MY_SETUP_VAR=$MY_SETUP_VAR` in the terminal. " | ||
| "Report the value you see." | ||
| ) | ||
| conversation.run() | ||
| print("✅ Agent ran in the same session with pre-configured env.") | ||
|
|
||
| # ============================================================= | ||
| # Summary | ||
| # ============================================================= | ||
| print(f"\n{'=' * 64}") | ||
| print("All done — execute_tool works via SDK and raw HTTP.") | ||
| print("=" * 64) | ||
|
|
||
| conversation.close() | ||
|
|
||
| cost = llm.metrics.accumulated_cost | ||
| print(f"EXAMPLE_COST: {cost}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟢 Acceptable: The
except KeyErrorbeforeexcept Exceptionisn't redundant — it correctly maps known errors to 400 vs generic errors to 500. The separation serves as documentation of expected error types. 👍