From 875dd02d22f59c532228209f730aff5559e91495 Mon Sep 17 00:00:00 2001 From: enyst Date: Sat, 24 Jan 2026 10:21:30 +0000 Subject: [PATCH 01/10] docs(sdk): add normative invariants to architecture docs\n\nCloses #1815\n\nCo-authored-by: openhands --- sdk/arch/agent.mdx | 70 +++++++++++++++++++++++++++++++++++---- sdk/arch/conversation.mdx | 47 +++++++++++++++++++++++--- sdk/arch/design.mdx | 67 +++++++++++++++++++++++++++++++++++++ sdk/arch/events.mdx | 43 +++++++++++++++++++++++- sdk/arch/tool-system.mdx | 46 +++++++++++++++++++++++++ sdk/arch/workspace.mdx | 32 ++++++++++++++++++ 6 files changed, 292 insertions(+), 13 deletions(-) diff --git a/sdk/arch/agent.mdx b/sdk/arch/agent.mdx index 22b0e134a..888a9d02f 100644 --- a/sdk/arch/agent.mdx +++ b/sdk/arch/agent.mdx @@ -199,31 +199,87 @@ Tools follow a **strict action-observation pattern**: flowchart TB LLM["LLM generates tool_call"] Convert["Convert to ActionEvent"] - + Decision{"Confirmation
mode?"} Defer["Store as pending"] - + Execute["Execute tool"] Success{"Success?"} - + Obs["ObservationEvent
with result"] Error["ObservationEvent
with error"] - + LLM --> Convert Convert --> Decision - + Decision -->|Yes| Defer Decision -->|No| Execute - + Execute --> Success Success -->|Yes| Obs Success -->|No| Error - + style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` +## Invariants (Normative) + +### AgentBase: Configuration is Stateless and Immutable + +Natural language invariant: + +- An `AgentBase` instance is a **pure configuration object**. It may cache materialized `ToolDefinition` instances internally, but it must remain valid to re-create those tools from its declarative spec. + +OCL-like: + +- `context AgentBase inv Frozen: self.model_config.frozen = true` + +### Initialization: System Prompt Precedes Any User Message + +`Agent.init_state(state, on_event=...)` is responsible for creating the initial system prompt event. + +Natural language invariant: + +- A `ConversationState` must not contain a user `MessageEvent` before it contains a `SystemPromptEvent`. + +OCL-like (conceptual): + +- `context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )` + +### Tool Materialization: Names Resolve to Registered ToolDefinitions + +An `Agent` is configured with a list of tool *specs* (`openhands.sdk.tool.spec.Tool`) that reference registered `ToolDefinition` factories. + +Natural language invariant: + +- `resolve_tool(Tool(name=X))` must succeed (tool name present in registry) for all tools the agent intends to use. +- Tool factories must return a **sequence** of `ToolDefinition` instances; tool sets (e.g., browser tool sets) are represented as multi-element sequences. + +### Multi-Tool Calls: Shared Thought Only on First ActionEvent + +When an LLM returns parallel tool calls, the SDK represents this as multiple `ActionEvent`s that share the same `llm_response_id`. + +Natural language invariant: + +- For a batch of `ActionEvent`s with the same `llm_response_id`, only the first action carries `thought` / `reasoning_content` / `thinking_blocks`; subsequent actions must have empty `thought`. + +OCL-like (as modeled in `event.base._combine_action_events`): + +- `context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()` + +### Confirmation Mode: Requires Both Analyzer and Policy + +`conversation.is_confirmation_mode_active` is true iff: + +- A `SecurityAnalyzer` is configured, and +- The confirmation policy is not `NeverConfirm`. + +OCL-like (conceptual): + +- `context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))` + **Execution Modes:** | Mode | Behavior | Use Case | diff --git a/sdk/arch/conversation.mdx b/sdk/arch/conversation.mdx index e03911211..0f60299f3 100644 --- a/sdk/arch/conversation.mdx +++ b/sdk/arch/conversation.mdx @@ -185,13 +185,50 @@ The conversation system provides pluggable services that operate independently o | **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | | **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | | **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | -| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer.py)** | Execution diagrams | Event stream → visual representation | +| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/default.py)** | Execution diagrams | Event stream → visual representation | | **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | -**Design Principle:** Services read from the event log but never mutate state directly. This enables: -- Services can be enabled/disabled independently -- Easy to add new services without changing core orchestration -- Event stream acts as the integration point +**Design Principle:** Services read from the event log but never mutate state directly. + +## Invariants (Normative) + +### Conversation Factory: Workspace Chooses Implementation + +Natural language invariant: + +- `Conversation(...)` is a factory that returns `LocalConversation` unless the provided `workspace` is a `RemoteWorkspace`. +- When `workspace` is remote, `persistence_dir` must be unset (`None`). + +OCL-like (conceptual): + +- `context Conversation::__new__ pre RemoteNoPersistence: workspace.oclIsKindOf(RemoteWorkspace) implies persistence_dir = null` + +### ConversationState: Validated Snapshot + Event Log + +Natural language invariants: + +- `ConversationState` is the **only** component intended to hold mutable execution status (`IDLE`, `RUNNING`, `WAITING_FOR_CONFIRMATION`, etc.). +- `ConversationState` owns persistence (`FileStore`) and the event store; all other components treat persistence as an implementation detail. + +### Confirmation Mode Predicate + +The SDK exposes a single predicate for confirmation mode: + +- Confirmation mode is active iff `state.security_analyzer != None` **and** the confirmation policy is not `NeverConfirm`. + +### ask_agent() Must Be Stateless + +Natural language invariant (from the public contract): + +- `BaseConversation.ask_agent(question)` **must not** append events, mutate execution status, or persist anything. It is safe to call concurrently with `run()`. + +### Secrets Persistence Requires a Cipher + +Natural language invariant: + +- If `ConversationState` is persisted without a cipher, secret values are redacted and **cannot be recovered on restore**. + +(Implication: use `Cipher` when persistence is enabled and you expect to resume with secrets intact.) ## Component Relationships diff --git a/sdk/arch/design.mdx b/sdk/arch/design.mdx index e946d9f9a..a6aa71dcc 100644 --- a/sdk/arch/design.mdx +++ b/sdk/arch/design.mdx @@ -58,3 +58,70 @@ Because agent logic was hard-coded into the core application, extending behavior **Everything should be composable and safe to extend.** Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. + +--- + +## Design Invariants (Normative) + +This page describes the **architectural invariants** the SDK relies on. These are treated as *contracts* between components. + +Where appropriate, we express invariants in a lightweight OCL-like notation: + +- `context X inv Name: ` +- `pre:` / `post:` for pre/post-conditions + +If an invariant cannot be expressed precisely in OCL without significant auxiliary modeling, we state it in precise natural language. + +### Single Source of Truth for Runtime State + +The SDK is designed so that **all runtime state that affects agent execution is representable as an event log plus a small, validated state snapshot**. + +- **Configuration objects are immutable** (Pydantic `frozen=True` where applicable). +- **The only intentionally mutable entity is `ConversationState`**, which owns the event log, execution status, secrets registry, and persistence handles. + +OCL-like: + +- `context AgentBase inv StatelessConfiguration: self.model_config.frozen = true` +- `context Event inv Immutable: self.model_config.frozen = true` + +Natural language invariant: + +- `ConversationState` is the single coordination point for execution. Other objects may maintain private runtime caches, but **must not** be required to restore or replay a conversation. + +### Workspace Boundary is the I/O Boundary + +All side effects against the environment (filesystem, processes, git operations) must occur **through a Workspace** (local or remote), which becomes the **I/O boundary**. + +- Tools may execute in different runtimes (local process vs inside agent-server), but *conceptually* they always operate against a workspace rooted at `workspace.working_dir`. + +OCL-like: + +- `context BaseWorkspace inv WorkingDirIsString: self.working_dir.oclIsTypeOf(String)` + +### Event Log is the Execution Trace + +The event stream is the single authoritative trace of what the agent *saw* and *did*. + +Natural language invariant: + +- Any agent decision that should be reproducible on replay must be representable as an `LLMConvertibleEvent` (for LLM context) plus associated non-LLM events (e.g., state updates, errors). + +### Tool Calls are Explicit, Typed, and Linkable + +The SDK assumes an explicit `Action -> Observation` pairing. + +OCL-like (conceptual): + +- `context ActionEvent inv HasToolCallId: self.tool_call_id <> null` +- `context ObservationEvent inv RefersToAction: self.action_id <> null` + +Natural language invariant: + +- Observations must be attributable to a specific action/tool call so that conversations can be audited, visualized, and resumed. + +### Remote vs Local is an Execution Detail + +The SDK makes *deployment mode* (local vs remote) a **factory decision**, not a type-system fork. + +- `Conversation(...)` returns either `LocalConversation` or `RemoteConversation` based solely on the provided workspace. +- User-facing code should not need to change when switching workspaces. diff --git a/sdk/arch/events.mdx b/sdk/arch/events.mdx index 2d37f9665..b47297dd5 100644 --- a/sdk/arch/events.mdx +++ b/sdk/arch/events.mdx @@ -142,7 +142,48 @@ Events for metadata, control flow, and user actions (not sent to LLM): | **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | | **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | | **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | -| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | + +## Invariants (Normative) + +### Event Immutability + +All events inherit from `Event` / `LLMConvertibleEvent` with Pydantic config `frozen=True` and `extra="forbid"`. + +Natural language invariant: + +- Once appended to the event log, an event must be treated as immutable. Mutations are represented as *new events*, not edits. + +OCL-like: + +- `context Event inv Frozen: self.model_config.frozen = true` + +### LLM-Convertible Stream Can Be Reconstructed Deterministically + +Natural language invariant: + +- `LLMConvertibleEvent.events_to_messages(events)` must produce the exact LLM message stream used for decision making, including batching of parallel tool calls. + +### Parallel Tool Calls are Batched by llm_response_id + +When multiple `ActionEvent`s share the same `llm_response_id`, they represent a single assistant turn with multiple tool calls. + +Natural language invariant: + +- In a batch, only the first `ActionEvent` may contain `thought`/reasoning; subsequent actions must have empty `thought`. This is asserted when combining events. + +### Condensation is a Pure View Transformation + +`Condensation.apply(events)` removes forgotten events and optionally inserts a synthetic `CondensationSummaryEvent` at `summary_offset`. + +Natural language invariants: + +- Condensation never mutates existing events; it returns a new list. +- `forgotten_event_ids` must refer to events that exist in the input list (otherwise the operation is a no-op for those IDs). +- If `summary` is present, `summary_offset` must also be present to insert the summary into the view; otherwise the summary is metadata only. + +OCL-like (conceptual): + +- `context Condensation inv SummaryOffsetPair: (self.summary <> null) implies (self.summary_offset <> null) or true -- insertion requires both; metadata-only summary allowed` **Source Types:** - **user**: Event originated from user input diff --git a/sdk/arch/tool-system.mdx b/sdk/arch/tool-system.mdx index 1762af9b5..181678df3 100644 --- a/sdk/arch/tool-system.mdx +++ b/sdk/arch/tool-system.mdx @@ -259,6 +259,52 @@ flowchart LR **Resolution Workflow:** +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "TerminalTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state + +## Invariants (Normative) + +### ToolDefinition Naming + +By default, tool names are derived from the class name: + +- `TerminalTool` → `terminal` +- `FileEditorTool` → `file_editor` + +Natural language invariant: + +- Unless explicitly overridden, `ToolDefinition.name` is deterministic and stable across runs. + +### Tool Registry + +`register_tool(name, factory)` maintains a global name→resolver mapping. + +Invariants: + +- Tool names must be non-empty strings. +- A `ToolDefinition` instance can only be registered if it has a non-None `executor`. +- A `ToolDefinition` subclass can only be registered if it implements a concrete `create(...)` classmethod that returns `Sequence[ToolDefinition]`. +- Resolving an unregistered tool name must raise `KeyError`. + +OCL-like (conceptual): + +- `context ToolRegistry inv NonEmptyNames: name.trim().size() > 0` + +### Executor Presence and Call Semantics + +Natural language invariant: + +- A `ToolDefinition` without an `executor` is not executable; attempts to call it must fail fast. +- All tool execution is performed in a `LocalConversation` context (even when invoked remotely) because the agent-server hosts the actual conversation that runs tools. + +### Action/Observation Schemas are Validated + +Natural language invariant: + +- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and outputs are coerced to the declared observation model (if present). + + 1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) 2. **Resolver Lookup** - Registry finds the registered resolver for the tool name 3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state diff --git a/sdk/arch/workspace.mdx b/sdk/arch/workspace.mdx index dbff538be..6148b326e 100644 --- a/sdk/arch/workspace.mdx +++ b/sdk/arch/workspace.mdx @@ -122,6 +122,38 @@ flowchart LR | **timeout** | bool | Whether command timed out | | **duration** | float | Execution time in seconds | +## Invariants (Normative) + +### Workspace Factory: Host Chooses Remote + +The `Workspace(...)` constructor is a factory: + +- If `host` is provided, it returns a `RemoteWorkspace`. +- Otherwise it returns a `LocalWorkspace`. + +OCL-like (conceptual): + +- `context Workspace::__new__ post RemoteIffHost: (host <> null) implies result.oclIsKindOf(RemoteWorkspace)` + +### BaseWorkspace Contract + +All workspace implementations must satisfy: + +- `execute_command(command, cwd, timeout)` returns a `CommandResult` where `exit_code=-1` indicates timeout. +- `file_upload` / `file_download` return a `FileOperationResult` with `success=false` and a populated `error` field on failure. +- Git helpers (`git_changes`, `git_diff`) must raise if the path is not a git repository. + +### working_dir Normalization + +Natural language invariant: + +- `working_dir` is normalized to a `str` even if passed as a `Path`. + +### Pause/Resume Semantics + +- `LocalWorkspace.pause()` / `.resume()` are no-ops. +- Remote/container workspaces may implement pause/resume; if unsupported they must raise `NotImplementedError`. + ### File Operations | Operation | Local Implementation | Remote Implementation | From f5fe3d4b06afce69c6bf89d830f80ee80475cddd Mon Sep 17 00:00:00 2001 From: enyst Date: Sat, 24 Jan 2026 13:49:35 +0000 Subject: [PATCH 02/10] docs(sdk): clarify wording on workspace parity and schema parsing\n\nCo-authored-by: openhands --- sdk/arch/design.mdx | 10 +++++++--- sdk/arch/tool-system.mdx | 2 +- sdk/arch/workspace.mdx | 9 +++++++-- 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/sdk/arch/design.mdx b/sdk/arch/design.mdx index a6aa71dcc..72aefd0a5 100644 --- a/sdk/arch/design.mdx +++ b/sdk/arch/design.mdx @@ -121,7 +121,11 @@ Natural language invariant: ### Remote vs Local is an Execution Detail -The SDK makes *deployment mode* (local vs remote) a **factory decision**, not a type-system fork. +The SDK makes *deployment mode* (local vs remote) a **runtime selection behind a common interface**, not two separate programming models. -- `Conversation(...)` returns either `LocalConversation` or `RemoteConversation` based solely on the provided workspace. -- User-facing code should not need to change when switching workspaces. +- `Conversation(...)` returns either `LocalConversation` or `RemoteConversation` based on the provided workspace. +- User-facing code typically should not need to change when switching workspaces; you mostly swap configuration. + + +This does **not** mean every optional method behaves identically across workspace types (e.g., `pause()` / `resume()` may be a no-op locally and meaningful remotely). The core conversation API (`send_message`, `run`, events) stays consistent. + diff --git a/sdk/arch/tool-system.mdx b/sdk/arch/tool-system.mdx index 181678df3..faa882628 100644 --- a/sdk/arch/tool-system.mdx +++ b/sdk/arch/tool-system.mdx @@ -302,7 +302,7 @@ Natural language invariant: Natural language invariant: -- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and outputs are coerced to the declared observation model (if present). +- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and tool results are **parsed/validated** into the declared observation model (if present). If the executor already returns the correct observation type, this is a no-op. 1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) diff --git a/sdk/arch/workspace.mdx b/sdk/arch/workspace.mdx index 6148b326e..728e3c8e5 100644 --- a/sdk/arch/workspace.mdx +++ b/sdk/arch/workspace.mdx @@ -149,10 +149,15 @@ Natural language invariant: - `working_dir` is normalized to a `str` even if passed as a `Path`. -### Pause/Resume Semantics +### Pause/Resume Semantics (Optional Capability) + +`pause()` / `resume()` are intentionally **optional capabilities**: - `LocalWorkspace.pause()` / `.resume()` are no-ops. -- Remote/container workspaces may implement pause/resume; if unsupported they must raise `NotImplementedError`. +- Remote/container workspaces may implement pause/resume to conserve resources. +- If a workspace type does not support pausing, it must raise `NotImplementedError`. + +This is compatible with the “swap workspaces without rewriting code” principle because most client code should only rely on the *core* workspace and conversation operations. Optional capabilities should be feature-detected or used conditionally. ### File Operations From 353a8b073f77ae9b2d87fa938aee8a03882ff0f7 Mon Sep 17 00:00:00 2001 From: enyst Date: Sat, 24 Jan 2026 14:02:45 +0000 Subject: [PATCH 03/10] docs(sdk): add discussion on pause/resume semantics\n\nCo-authored-by: openhands --- sdk/arch/workspace.mdx | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/sdk/arch/workspace.mdx b/sdk/arch/workspace.mdx index 728e3c8e5..23149e5a8 100644 --- a/sdk/arch/workspace.mdx +++ b/sdk/arch/workspace.mdx @@ -159,6 +159,27 @@ Natural language invariant: This is compatible with the “swap workspaces without rewriting code” principle because most client code should only rely on the *core* workspace and conversation operations. Optional capabilities should be feature-detected or used conditionally. +#### Discussion: `pause()` / `resume()` semantics (design tradeoff) + +There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**. + +- Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op. +- Some remote/container workspaces may be able to pause a container or VM, but others may not. + +This tension matters because it creates two different reasonable expectations: + +1. *Ergonomic expectation*: orchestration code can call `pause()` unconditionally and it will be safe. +2. *Guarantee expectation*: calling `pause()` actually pauses resource usage. + +**Maybe it would make sense to** model this explicitly as an optional capability: + +- Add `supports_pause` (or a richer `pause_capability`) to `BaseWorkspace`, and +- Make `pause()` / `resume()` no-ops everywhere by default (including remote) while letting pausable implementations override, +- Keep a strict helper (e.g., `pause_or_raise()`) for callers who require a guarantee. + +This would make the default behavior unsurprising (safe to call), while still letting clients opt into fail-fast behavior when pausing is required. + + ### File Operations | Operation | Local Implementation | Remote Implementation | From 0e831601efeef8a6a53ce708a6fbd8b31a6d3ef4 Mon Sep 17 00:00:00 2001 From: enyst Date: Sat, 24 Jan 2026 14:06:11 +0000 Subject: [PATCH 04/10] docs(sdk): reframe workspace pause/resume compatibility as discussion\n\nCo-authored-by: openhands --- sdk/arch/workspace.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sdk/arch/workspace.mdx b/sdk/arch/workspace.mdx index 23149e5a8..4b3b32279 100644 --- a/sdk/arch/workspace.mdx +++ b/sdk/arch/workspace.mdx @@ -157,10 +157,10 @@ Natural language invariant: - Remote/container workspaces may implement pause/resume to conserve resources. - If a workspace type does not support pausing, it must raise `NotImplementedError`. -This is compatible with the “swap workspaces without rewriting code” principle because most client code should only rely on the *core* workspace and conversation operations. Optional capabilities should be feature-detected or used conditionally. - #### Discussion: `pause()` / `resume()` semantics (design tradeoff) +There is an argument that this is compatible with the “swap workspaces without rewriting code” principle, because most client code should only rely on the *core* workspace and conversation operations, while optional capabilities are feature-detected or used conditionally. + There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**. - Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op. From 67b9cc10a05f3adc987149988e4b5324bdf43d8b Mon Sep 17 00:00:00 2001 From: enyst Date: Sat, 24 Jan 2026 14:19:27 +0000 Subject: [PATCH 05/10] docs(sdk): cross-link condenser and events architecture\n\nCo-authored-by: openhands --- sdk/arch/condenser.mdx | 6 +++++- sdk/arch/events.mdx | 2 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/sdk/arch/condenser.mdx b/sdk/arch/condenser.mdx index f5702ce09..310c3133b 100644 --- a/sdk/arch/condenser.mdx +++ b/sdk/arch/condenser.mdx @@ -3,7 +3,11 @@ title: Condenser description: High-level architecture of the conversation history compression system --- -The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). +The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. + +For how condensation is represented in the event system (`Condensation`, `CondensationRequest`, and how they transform the LLM view), see **[Events Architecture](/sdk/arch/events)**. + +For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). **Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) diff --git a/sdk/arch/events.mdx b/sdk/arch/events.mdx index b47297dd5..6d1defe77 100644 --- a/sdk/arch/events.mdx +++ b/sdk/arch/events.mdx @@ -185,6 +185,8 @@ OCL-like (conceptual): - `context Condensation inv SummaryOffsetPair: (self.summary <> null) implies (self.summary_offset <> null) or true -- insertion requires both; metadata-only summary allowed` +For the condenser algorithms, thresholds, and configuration, see **[Condenser Architecture](/sdk/arch/condenser)**. + **Source Types:** - **user**: Event originated from user input - **agent**: Event generated by agent logic From 3360e635004fac6e0634ea5aa2c08919bb922b24 Mon Sep 17 00:00:00 2001 From: openhands Date: Sat, 7 Mar 2026 10:07:31 +0000 Subject: [PATCH 06/10] docs: note AI invariants comment convention Co-authored-by: openhands --- AGENTS.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 022e2e0e7..e1948945e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -128,6 +128,15 @@ Workflow: `.github/workflows/sync-agent-sdk-openapi.yml` - Use Mintlify components (``, ``, ``, etc.) where appropriate. - When linking internally, prefer **absolute** doc paths (e.g. `/overview/quickstart`). + +## AI-only invariants in SDK architecture docs + +- Keep natural-language invariants visible in `sdk/arch/*.mdx`. +- Wrap OCL-like invariants in HTML comments with clear sentinels: + - `` + - `` + - Place the OCL block inside the comment so it does not render. + ## LLM API Key Options The SDK documentation maintains three ways for users to obtain LLM access: From 9caf631c7ad0c5e35592a2ecf75f4e2e7741ff07 Mon Sep 17 00:00:00 2001 From: openhands Date: Sun, 8 Mar 2026 06:19:37 +0000 Subject: [PATCH 07/10] docs: inject AI invariants into llms-full Co-authored-by: openhands --- AGENTS.md | 7 +- llms-full.txt | 377 +++++++++++++++++++++++++++++++-- llms.txt | 2 +- scripts/generate-llms-files.py | 20 ++ sdk/arch/agent.mdx | 24 +-- sdk/arch/conversation.mdx | 6 +- sdk/arch/design.mdx | 18 +- sdk/arch/events.mdx | 12 +- sdk/arch/tool-system.mdx | 6 +- sdk/arch/workspace.mdx | 6 +- 10 files changed, 412 insertions(+), 66 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 6027606b8..3ee571d24 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -156,10 +156,9 @@ Workflow: `.github/workflows/sync-agent-sdk-openapi.yml` ## AI-only invariants in SDK architecture docs - Keep natural-language invariants visible in `sdk/arch/*.mdx`. -- Wrap OCL-like invariants in HTML comments with clear sentinels: - - `` - - `` - - Place the OCL block inside the comment so it does not render. +- Wrap OCL-like invariants in an MDX comment block so they do not render: + - `{/* AI_INVARIANTS_BEGIN` ... `AI_INVARIANTS_END */}` +- The llms generator extracts these blocks and injects them into `llms-full.txt`. ## LLM API Key Options diff --git a/llms-full.txt b/llms-full.txt index 10aea661b..eaec1b7bd 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -4092,31 +4092,79 @@ Tools follow a **strict action-observation pattern**: flowchart TB LLM["LLM generates tool_call"] Convert["Convert to ActionEvent"] - + Decision{"Confirmation
mode?"} Defer["Store as pending"] - + Execute["Execute tool"] Success{"Success?"} - + Obs["ObservationEvent
with result"] Error["ObservationEvent
with error"] - + LLM --> Convert Convert --> Decision - + Decision -->|Yes| Defer Decision -->|No| Execute - + Execute --> Success Success -->|Yes| Obs Success -->|No| Error - + style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` +## Invariants (Normative) + +### AgentBase: Configuration is Stateless and Immutable + +Natural language invariant: + +- An `AgentBase` instance is a **pure configuration object**. It may cache materialized `ToolDefinition` instances internally, but it must remain valid to re-create those tools from its declarative spec. + + + +### Initialization: System Prompt Precedes Any User Message + +`Agent.init_state(state, on_event=...)` is responsible for creating the initial system prompt event. + +Natural language invariant: + +- A `ConversationState` must not contain a user `MessageEvent` before it contains a `SystemPromptEvent`. + + + +### Tool Materialization: Names Resolve to Registered ToolDefinitions + +An `Agent` is configured with a list of tool *specs* (`openhands.sdk.tool.spec.Tool`) that reference registered `ToolDefinition` factories. + +Natural language invariant: + +- `resolve_tool(Tool(name=X))` must succeed (tool name present in registry) for all tools the agent intends to use. +- Tool factories must return a **sequence** of `ToolDefinition` instances; tool sets (e.g., browser tool sets) are represented as multi-element sequences. + +### Multi-Tool Calls: Shared Thought Only on First ActionEvent + +When an LLM returns parallel tool calls, the SDK represents this as multiple `ActionEvent`s that share the same `llm_response_id`. + +Natural language invariant: + +- For a batch of `ActionEvent`s with the same `llm_response_id`, only the first action carries `thought` / `reasoning_content` / `thinking_blocks`; subsequent actions must have empty `thought`. + + + +### Confirmation Mode: Requires Both Analyzer and Policy + +`conversation.is_confirmation_mode_active` is true iff: + +- A `SecurityAnalyzer` is configured, and +- The confirmation policy is not `NeverConfirm`. + + + **Execution Modes:** | Mode | Behavior | Use Case | @@ -4171,6 +4219,24 @@ flowchart LR - **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns - **[LLM](/sdk/arch/llm)** - Language model abstraction +#### AI Invariants (OCL-like) + +OCL-like: + +- `context AgentBase inv Frozen: self.model_config.frozen = true` + +OCL-like (conceptual): + +- `context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )` + +OCL-like (as modeled in `event.base._combine_action_events`): + +- `context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()` + +OCL-like (conceptual): + +- `context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))` + ### Agent Server Package Source: https://docs.openhands.dev/sdk/arch/agent-server.md @@ -4707,7 +4773,11 @@ async def logging_middleware(request, call_next): ### Condenser Source: https://docs.openhands.dev/sdk/arch/condenser.md -The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). +The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. + +For how condensation is represented in the event system (`Condensation`, `CondensationRequest`, and how they transform the LLM view), see **[Events Architecture](/sdk/arch/events)**. + +For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). **Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) @@ -5288,10 +5358,45 @@ The conversation system provides pluggable services that operate independently o | **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | | **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | -**Design Principle:** Services read from the event log but never mutate state directly. This enables: -- Services can be enabled/disabled independently -- Easy to add new services without changing core orchestration -- Event stream acts as the integration point +**Design Principle:** Services read from the event log but never mutate state directly. + +## Invariants (Normative) + +### Conversation Factory: Workspace Chooses Implementation + +Natural language invariant: + +- `Conversation(...)` is a factory that returns `LocalConversation` unless the provided `workspace` is a `RemoteWorkspace`. +- When `workspace` is remote, `persistence_dir` must be unset (`None`). + + + +### ConversationState: Validated Snapshot + Event Log + +Natural language invariants: + +- `ConversationState` is the **only** component intended to hold mutable execution status (`IDLE`, `RUNNING`, `WAITING_FOR_CONFIRMATION`, etc.). +- `ConversationState` owns persistence (`FileStore`) and the event store; all other components treat persistence as an implementation detail. + +### Confirmation Mode Predicate + +The SDK exposes a single predicate for confirmation mode: + +- Confirmation mode is active iff `state.security_analyzer != None` **and** the confirmation policy is not `NeverConfirm`. + +### ask_agent() Must Be Stateless + +Natural language invariant (from the public contract): + +- `BaseConversation.ask_agent(question)` **must not** append events, mutate execution status, or persist anything. It is safe to call concurrently with `run()`. + +### Secrets Persistence Requires a Cipher + +Natural language invariant: + +- If `ConversationState` is persisted without a cipher, secret values are redacted and **cannot be recovered on restore**. + +(Implication: use `Cipher` when persistence is enabled and you expect to resume with secrets intact.) ## Component Relationships @@ -5329,6 +5434,12 @@ flowchart LR - **[Event System](/sdk/arch/events)** - Event types and flow - **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples +#### AI Invariants (OCL-like) + +OCL-like (conceptual): + +- `context Conversation::__new__ pre RemoteNoPersistence: workspace.oclIsKindOf(RemoteWorkspace) implies persistence_dir = null` + ### Design Principles Source: https://docs.openhands.dev/sdk/arch/design.md @@ -5387,6 +5498,85 @@ Because agent logic was hard-coded into the core application, extending behavior Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. +--- + +## Design Invariants (Normative) + +This page describes the **architectural invariants** the SDK relies on. These are treated as *contracts* between components. + +Where appropriate, we express invariants in a lightweight OCL-like notation: + +- `context X inv Name: ` +- `pre:` / `post:` for pre/post-conditions + +If an invariant cannot be expressed precisely in OCL without significant auxiliary modeling, we state it in precise natural language. + +### Single Source of Truth for Runtime State + +The SDK is designed so that **all runtime state that affects agent execution is representable as an event log plus a small, validated state snapshot**. + +- **Configuration objects are immutable** (Pydantic `frozen=True` where applicable). +- **The only intentionally mutable entity is `ConversationState`**, which owns the event log, execution status, secrets registry, and persistence handles. + + + +Natural language invariant: + +- `ConversationState` is the single coordination point for execution. Other objects may maintain private runtime caches, but **must not** be required to restore or replay a conversation. + +### Workspace Boundary is the I/O Boundary + +All side effects against the environment (filesystem, processes, git operations) must occur **through a Workspace** (local or remote), which becomes the **I/O boundary**. + +- Tools may execute in different runtimes (local process vs inside agent-server), but *conceptually* they always operate against a workspace rooted at `workspace.working_dir`. + + + +### Event Log is the Execution Trace + +The event stream is the single authoritative trace of what the agent *saw* and *did*. + +Natural language invariant: + +- Any agent decision that should be reproducible on replay must be representable as an `LLMConvertibleEvent` (for LLM context) plus associated non-LLM events (e.g., state updates, errors). + +### Tool Calls are Explicit, Typed, and Linkable + +The SDK assumes an explicit `Action -> Observation` pairing. + + + +Natural language invariant: + +- Observations must be attributable to a specific action/tool call so that conversations can be audited, visualized, and resumed. + +### Remote vs Local is an Execution Detail + +The SDK makes *deployment mode* (local vs remote) a **runtime selection behind a common interface**, not two separate programming models. + +- `Conversation(...)` returns either `LocalConversation` or `RemoteConversation` based on the provided workspace. +- User-facing code typically should not need to change when switching workspaces; you mostly swap configuration. + + +This does **not** mean every optional method behaves identically across workspace types (e.g., `pause()` / `resume()` may be a no-op locally and meaningful remotely). The core conversation API (`send_message`, `run`, events) stays consistent. + + +#### AI Invariants (OCL-like) + +OCL-like: + +- `context AgentBase inv StatelessConfiguration: self.model_config.frozen = true` +- `context Event inv Immutable: self.model_config.frozen = true` + +OCL-like: + +- `context BaseWorkspace inv WorkingDirIsString: self.working_dir.oclIsTypeOf(String)` + +OCL-like (conceptual): + +- `context ActionEvent inv HasToolCallId: self.tool_call_id <> null` +- `context ObservationEvent inv RefersToAction: self.action_id <> null` + ### Events Source: https://docs.openhands.dev/sdk/arch/events.md @@ -5529,7 +5719,46 @@ Events for metadata, control flow, and user actions (not sent to LLM): | **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | | **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | | **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | -| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | + +## Invariants (Normative) + +### Event Immutability + +All events inherit from `Event` / `LLMConvertibleEvent` with Pydantic config `frozen=True` and `extra="forbid"`. + +Natural language invariant: + +- Once appended to the event log, an event must be treated as immutable. Mutations are represented as *new events*, not edits. + + + +### LLM-Convertible Stream Can Be Reconstructed Deterministically + +Natural language invariant: + +- `LLMConvertibleEvent.events_to_messages(events)` must produce the exact LLM message stream used for decision making, including batching of parallel tool calls. + +### Parallel Tool Calls are Batched by llm_response_id + +When multiple `ActionEvent`s share the same `llm_response_id`, they represent a single assistant turn with multiple tool calls. + +Natural language invariant: + +- In a batch, only the first `ActionEvent` may contain `thought`/reasoning; subsequent actions must have empty `thought`. This is asserted when combining events. + +### Condensation is a Pure View Transformation + +`Condensation.apply(events)` removes forgotten events and optionally inserts a synthetic `CondensationSummaryEvent` at `summary_offset`. + +Natural language invariants: + +- Condensation never mutates existing events; it returns a new list. +- `forgotten_event_ids` must refer to events that exist in the input list (otherwise the operation is a no-op for those IDs). +- If `summary` is present, `summary_offset` must also be present to insert the summary into the view; otherwise the summary is metadata only. + + + +For the condenser algorithms, thresholds, and configuration, see **[Condenser Architecture](/sdk/arch/condenser)**. **Source Types:** - **user**: Event originated from user input @@ -5610,6 +5839,16 @@ Two distinct error events exist in the SDK, with different purpose and visibilit - **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation - **[Condenser](/sdk/arch/condenser)** - Event history compression +#### AI Invariants (OCL-like) + +OCL-like: + +- `context Event inv Frozen: self.model_config.frozen = true` + +OCL-like (conceptual): + +- `context Condensation inv SummaryOffsetPair: (self.summary <> null) implies (self.summary_offset <> null) or true -- insertion requires both; metadata-only summary allowed` + ### LLM Source: https://docs.openhands.dev/sdk/arch/llm.md @@ -8145,6 +8384,50 @@ flowchart LR **Resolution Workflow:** +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "TerminalTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state + +## Invariants (Normative) + +### ToolDefinition Naming + +By default, tool names are derived from the class name: + +- `TerminalTool` → `terminal` +- `FileEditorTool` → `file_editor` + +Natural language invariant: + +- Unless explicitly overridden, `ToolDefinition.name` is deterministic and stable across runs. + +### Tool Registry + +`register_tool(name, factory)` maintains a global name→resolver mapping. + +Invariants: + +- Tool names must be non-empty strings. +- A `ToolDefinition` instance can only be registered if it has a non-None `executor`. +- A `ToolDefinition` subclass can only be registered if it implements a concrete `create(...)` classmethod that returns `Sequence[ToolDefinition]`. +- Resolving an unregistered tool name must raise `KeyError`. + + + +### Executor Presence and Call Semantics + +Natural language invariant: + +- A `ToolDefinition` without an `executor` is not executable; attempts to call it must fail fast. +- All tool execution is performed in a `LocalConversation` context (even when invoked remotely) because the agent-server hosts the actual conversation that runs tools. + +### Action/Observation Schemas are Validated + +Natural language invariant: + +- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and tool results are **parsed/validated** into the declared observation model (if present). If the executor already returns the correct observation type, this is a no-op. + + 1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) 2. **Resolver Lookup** - Registry finds the registered resolver for the tool name 3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state @@ -8402,6 +8685,12 @@ flowchart TB - **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools - **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library +#### AI Invariants (OCL-like) + +OCL-like (conceptual): + +- `context ToolRegistry inv NonEmptyNames: name.trim().size() > 0` + ### Workspace Source: https://docs.openhands.dev/sdk/arch/workspace.md @@ -8524,6 +8813,62 @@ flowchart LR | **timeout** | bool | Whether command timed out | | **duration** | float | Execution time in seconds | +## Invariants (Normative) + +### Workspace Factory: Host Chooses Remote + +The `Workspace(...)` constructor is a factory: + +- If `host` is provided, it returns a `RemoteWorkspace`. +- Otherwise it returns a `LocalWorkspace`. + + + +### BaseWorkspace Contract + +All workspace implementations must satisfy: + +- `execute_command(command, cwd, timeout)` returns a `CommandResult` where `exit_code=-1` indicates timeout. +- `file_upload` / `file_download` return a `FileOperationResult` with `success=false` and a populated `error` field on failure. +- Git helpers (`git_changes`, `git_diff`) must raise if the path is not a git repository. + +### working_dir Normalization + +Natural language invariant: + +- `working_dir` is normalized to a `str` even if passed as a `Path`. + +### Pause/Resume Semantics (Optional Capability) + +`pause()` / `resume()` are intentionally **optional capabilities**: + +- `LocalWorkspace.pause()` / `.resume()` are no-ops. +- Remote/container workspaces may implement pause/resume to conserve resources. +- If a workspace type does not support pausing, it must raise `NotImplementedError`. + +#### Discussion: `pause()` / `resume()` semantics (design tradeoff) + +There is an argument that this is compatible with the “swap workspaces without rewriting code” principle, because most client code should only rely on the *core* workspace and conversation operations, while optional capabilities are feature-detected or used conditionally. + +There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**. + +- Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op. +- Some remote/container workspaces may be able to pause a container or VM, but others may not. + +This tension matters because it creates two different reasonable expectations: + +1. *Ergonomic expectation*: orchestration code can call `pause()` unconditionally and it will be safe. +2. *Guarantee expectation*: calling `pause()` actually pauses resource usage. + +**Maybe it would make sense to** model this explicitly as an optional capability: + +- Add `supports_pause` (or a richer `pause_capability`) to `BaseWorkspace`, and +- Make `pause()` / `resume()` no-ops everywhere by default (including remote) while letting pausable implementations override, +- Keep a strict helper (e.g., `pause_or_raise()`) for callers who require a guarantee. + +This would make the default behavior unsurprising (safe to call), while still letting clients opt into fail-fast behavior when pausing is required. + + ### File Operations | Operation | Local Implementation | Remote Implementation | @@ -8608,6 +8953,12 @@ flowchart LR - **[Agent Server](/sdk/arch/agent-server)** - Remote execution API - **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution +#### AI Invariants (OCL-like) + +OCL-like (conceptual): + +- `context Workspace::__new__ post RemoteIffHost: (host <> null) implies result.oclIsKindOf(RemoteWorkspace)` + ### FAQ Source: https://docs.openhands.dev/sdk/faq.md diff --git a/llms.txt b/llms.txt index 849a69c4b..dbd863d2c 100644 --- a/llms.txt +++ b/llms.txt @@ -2,7 +2,7 @@ > LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded. -The sections below intentionally separate OpenHands applications documentation (Web App Server / Cloud / CLI) +The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI) from the OpenHands Software Agent SDK. ## OpenHands Software Agent SDK diff --git a/scripts/generate-llms-files.py b/scripts/generate-llms-files.py index 543456af7..95266fd4c 100755 --- a/scripts/generate-llms-files.py +++ b/scripts/generate-llms-files.py @@ -51,6 +51,10 @@ BASE_URL = "https://docs.openhands.dev" EXCLUDED_DIRS = {".git", ".github", ".agents", "tests", "openapi", "logo"} +AI_INVARIANTS_RE = re.compile( + r"\{/\*\s*AI_INVARIANTS_BEGIN\s*\n(.*?)\n\s*AI_INVARIANTS_END\s*\*/\}", + re.DOTALL, +) @dataclass(frozen=True) @@ -60,6 +64,7 @@ class DocPage: title: str description: str | None body: str + ai_invariants: list[str] _FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL) @@ -99,6 +104,12 @@ def parse_frontmatter(text: str) -> tuple[dict[str, str], str]: return fm, body +def extract_ai_invariants(body: str) -> tuple[str, list[str]]: + matches = [match.group(1).strip() for match in AI_INVARIANTS_RE.finditer(body)] + cleaned = AI_INVARIANTS_RE.sub("", body) + return cleaned, matches + + def rel_to_route(rel_path: Path) -> str: p = rel_path.as_posix() if p.endswith(".mdx"): @@ -132,6 +143,7 @@ def iter_doc_pages() -> list[DocPage]: raw = mdx_path.read_text(encoding="utf-8") fm, body = parse_frontmatter(raw) + body, ai_invariants = extract_ai_invariants(body) title = fm.get("title") if not title: @@ -147,6 +159,7 @@ def iter_doc_pages() -> list[DocPage]: title=title, description=description, body=body.strip(), + ai_invariants=ai_invariants, ) ) @@ -285,6 +298,13 @@ def build_llms_full_txt(pages: list[DocPage]) -> str: lines.append(page.body) lines.append("") + if page.ai_invariants: + lines.append("#### AI Invariants (OCL-like)") + lines.append("") + for block in page.ai_invariants: + lines.append(block) + lines.append("") + return "\n".join(lines).rstrip() + "\n" diff --git a/sdk/arch/agent.mdx b/sdk/arch/agent.mdx index e7c984536..544c8451a 100644 --- a/sdk/arch/agent.mdx +++ b/sdk/arch/agent.mdx @@ -232,13 +232,11 @@ Natural language invariant: - An `AgentBase` instance is a **pure configuration object**. It may cache materialized `ToolDefinition` instances internally, but it must remain valid to re-create those tools from its declarative spec. - - - +AI_INVARIANTS_END */} ### Initialization: System Prompt Precedes Any User Message @@ -248,13 +246,11 @@ Natural language invariant: - A `ConversationState` must not contain a user `MessageEvent` before it contains a `SystemPromptEvent`. - - - +AI_INVARIANTS_END */} ### Tool Materialization: Names Resolve to Registered ToolDefinitions @@ -273,13 +269,11 @@ Natural language invariant: - For a batch of `ActionEvent`s with the same `llm_response_id`, only the first action carries `thought` / `reasoning_content` / `thinking_blocks`; subsequent actions must have empty `thought`. - - - +AI_INVARIANTS_END */} ### Confirmation Mode: Requires Both Analyzer and Policy @@ -288,13 +282,11 @@ OCL-like (as modeled in `event.base._combine_action_events`): - A `SecurityAnalyzer` is configured, and - The confirmation policy is not `NeverConfirm`. - - - +AI_INVARIANTS_END */} **Execution Modes:** diff --git a/sdk/arch/conversation.mdx b/sdk/arch/conversation.mdx index 38a293963..e13019a94 100644 --- a/sdk/arch/conversation.mdx +++ b/sdk/arch/conversation.mdx @@ -199,13 +199,11 @@ Natural language invariant: - `Conversation(...)` is a factory that returns `LocalConversation` unless the provided `workspace` is a `RemoteWorkspace`. - When `workspace` is remote, `persistence_dir` must be unset (`None`). - - - +AI_INVARIANTS_END */} ### ConversationState: Validated Snapshot + Event Log diff --git a/sdk/arch/design.mdx b/sdk/arch/design.mdx index 165725988..7a5af8e66 100644 --- a/sdk/arch/design.mdx +++ b/sdk/arch/design.mdx @@ -79,14 +79,12 @@ The SDK is designed so that **all runtime state that affects agent execution is - **Configuration objects are immutable** (Pydantic `frozen=True` where applicable). - **The only intentionally mutable entity is `ConversationState`**, which owns the event log, execution status, secrets registry, and persistence handles. - - - +AI_INVARIANTS_END */} Natural language invariant: @@ -98,13 +96,11 @@ All side effects against the environment (filesystem, processes, git operations) - Tools may execute in different runtimes (local process vs inside agent-server), but *conceptually* they always operate against a workspace rooted at `workspace.working_dir`. - - - +AI_INVARIANTS_END */} ### Event Log is the Execution Trace @@ -118,14 +114,12 @@ Natural language invariant: The SDK assumes an explicit `Action -> Observation` pairing. - - - +AI_INVARIANTS_END */} Natural language invariant: diff --git a/sdk/arch/events.mdx b/sdk/arch/events.mdx index 32c42051c..08ba16060 100644 --- a/sdk/arch/events.mdx +++ b/sdk/arch/events.mdx @@ -153,13 +153,11 @@ Natural language invariant: - Once appended to the event log, an event must be treated as immutable. Mutations are represented as *new events*, not edits. - - - +AI_INVARIANTS_END */} ### LLM-Convertible Stream Can Be Reconstructed Deterministically @@ -185,13 +183,11 @@ Natural language invariants: - `forgotten_event_ids` must refer to events that exist in the input list (otherwise the operation is a no-op for those IDs). - If `summary` is present, `summary_offset` must also be present to insert the summary into the view; otherwise the summary is metadata only. - - - +AI_INVARIANTS_END */} For the condenser algorithms, thresholds, and configuration, see **[Condenser Architecture](/sdk/arch/condenser)**. diff --git a/sdk/arch/tool-system.mdx b/sdk/arch/tool-system.mdx index 17d65afaf..50580b988 100644 --- a/sdk/arch/tool-system.mdx +++ b/sdk/arch/tool-system.mdx @@ -287,13 +287,11 @@ Invariants: - A `ToolDefinition` subclass can only be registered if it implements a concrete `create(...)` classmethod that returns `Sequence[ToolDefinition]`. - Resolving an unregistered tool name must raise `KeyError`. - - - +AI_INVARIANTS_END */} ### Executor Presence and Call Semantics diff --git a/sdk/arch/workspace.mdx b/sdk/arch/workspace.mdx index 762d3a529..55daffd93 100644 --- a/sdk/arch/workspace.mdx +++ b/sdk/arch/workspace.mdx @@ -131,13 +131,11 @@ The `Workspace(...)` constructor is a factory: - If `host` is provided, it returns a `RemoteWorkspace`. - Otherwise it returns a `LocalWorkspace`. - - - +AI_INVARIANTS_END */} ### BaseWorkspace Contract From 6295adc572c8a3b2d283b77338d4937b0a206a6e Mon Sep 17 00:00:00 2001 From: openhands Date: Sun, 8 Mar 2026 12:17:52 +0000 Subject: [PATCH 08/10] docs: move invariants to AI sidecars Co-authored-by: openhands --- AGENTS.md | 8 +- llms-full.txt | 591 ++++++++++++------------- scripts/generate-llms-files.py | 25 +- sdk/arch/agent.ai-invariants.md | 73 +++ sdk/arch/agent.mdx | 78 ---- sdk/arch/conversation.ai-invariants.md | 40 ++ sdk/arch/conversation.mdx | 42 -- sdk/arch/design.ai-invariants.md | 71 +++ sdk/arch/design.mdx | 75 ---- sdk/arch/events.ai-invariants.md | 50 +++ sdk/arch/events.mdx | 53 --- sdk/arch/tool-system.ai-invariants.md | 83 ++++ sdk/arch/tool-system.mdx | 86 ---- sdk/arch/workspace.ai-invariants.md | 66 +++ sdk/arch/workspace.mdx | 68 --- 15 files changed, 681 insertions(+), 728 deletions(-) create mode 100644 sdk/arch/agent.ai-invariants.md create mode 100644 sdk/arch/conversation.ai-invariants.md create mode 100644 sdk/arch/design.ai-invariants.md create mode 100644 sdk/arch/events.ai-invariants.md create mode 100644 sdk/arch/tool-system.ai-invariants.md create mode 100644 sdk/arch/workspace.ai-invariants.md diff --git a/AGENTS.md b/AGENTS.md index 3ee571d24..f320a8e15 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -155,10 +155,10 @@ Workflow: `.github/workflows/sync-agent-sdk-openapi.yml` ## AI-only invariants in SDK architecture docs -- Keep natural-language invariants visible in `sdk/arch/*.mdx`. -- Wrap OCL-like invariants in an MDX comment block so they do not render: - - `{/* AI_INVARIANTS_BEGIN` ... `AI_INVARIANTS_END */}` -- The llms generator extracts these blocks and injects them into `llms-full.txt`. +- AI-only invariants live in sidecar files alongside the architecture pages: + - `sdk/arch/.ai-invariants.md` +- These files are excluded from the human docs and injected into `llms-full.txt` + by `scripts/generate-llms-files.py`. ## LLM API Key Options diff --git a/llms-full.txt b/llms-full.txt index eaec1b7bd..2492edc5a 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -4117,6 +4117,46 @@ flowchart TB style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` +## Component Relationships + +### How Agent Interacts + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Conv["Conversation"] + LLM["LLM"] + Tools["Tools"] + Context["AgentContext"] + + Conv -->|.step calls| Agent + Agent -->|Reads events| Conv + Agent -->|Query| LLM + Agent -->|Execute| Tools + Context -.->|Skills and Context| Agent + Agent -.->|New events| Conv + + style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Agent**: Orchestrates step execution, provides event history +- **Agent → LLM**: Queries for next actions, receives tool calls or messages +- **Agent → Tools**: Executes actions, receives observations +- **AgentContext → Agent**: Injects skills and prompts into LLM queries + + +## See Also + +- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle +- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns +- **[Events](/sdk/arch/events)** - Event types and structures +- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns +- **[LLM](/sdk/arch/llm)** - Language model abstraction + ## Invariants (Normative) ### AgentBase: Configuration is Stateless and Immutable @@ -4125,6 +4165,9 @@ Natural language invariant: - An `AgentBase` instance is a **pure configuration object**. It may cache materialized `ToolDefinition` instances internally, but it must remain valid to re-create those tools from its declarative spec. +OCL-like: + +- `context AgentBase inv Frozen: self.model_config.frozen = true` ### Initialization: System Prompt Precedes Any User Message @@ -4135,6 +4178,9 @@ Natural language invariant: - A `ConversationState` must not contain a user `MessageEvent` before it contains a `SystemPromptEvent`. +OCL-like (conceptual): + +- `context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )` ### Tool Materialization: Names Resolve to Registered ToolDefinitions @@ -4154,6 +4200,9 @@ Natural language invariant: - For a batch of `ActionEvent`s with the same `llm_response_id`, only the first action carries `thought` / `reasoning_content` / `thinking_blocks`; subsequent actions must have empty `thought`. +OCL-like (as modeled in `event.base._combine_action_events`): + +- `context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()` ### Confirmation Mode: Requires Both Analyzer and Policy @@ -4163,6 +4212,9 @@ Natural language invariant: - A `SecurityAnalyzer` is configured, and - The confirmation policy is not `NeverConfirm`. +OCL-like (conceptual): + +- `context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))` **Execution Modes:** @@ -4179,64 +4231,6 @@ Before execution, the security analyzer evaluates each action: - **Medium Risk:** Log warning, execute with monitoring - **High Risk:** Block execution, request user confirmation -## Component Relationships - -### How Agent Interacts - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Agent["Agent"] - Conv["Conversation"] - LLM["LLM"] - Tools["Tools"] - Context["AgentContext"] - - Conv -->|.step calls| Agent - Agent -->|Reads events| Conv - Agent -->|Query| LLM - Agent -->|Execute| Tools - Context -.->|Skills and Context| Agent - Agent -.->|New events| Conv - - style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Relationship Characteristics:** -- **Conversation → Agent**: Orchestrates step execution, provides event history -- **Agent → LLM**: Queries for next actions, receives tool calls or messages -- **Agent → Tools**: Executes actions, receives observations -- **AgentContext → Agent**: Injects skills and prompts into LLM queries - - -## See Also - -- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle -- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns -- **[Events](/sdk/arch/events)** - Event types and structures -- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns -- **[LLM](/sdk/arch/llm)** - Language model abstraction - -#### AI Invariants (OCL-like) - -OCL-like: - -- `context AgentBase inv Frozen: self.model_config.frozen = true` - -OCL-like (conceptual): - -- `context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )` - -OCL-like (as modeled in `event.base._combine_action_events`): - -- `context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()` - -OCL-like (conceptual): - -- `context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))` - ### Agent Server Package Source: https://docs.openhands.dev/sdk/arch/agent-server.md @@ -5360,44 +5354,6 @@ The conversation system provides pluggable services that operate independently o **Design Principle:** Services read from the event log but never mutate state directly. -## Invariants (Normative) - -### Conversation Factory: Workspace Chooses Implementation - -Natural language invariant: - -- `Conversation(...)` is a factory that returns `LocalConversation` unless the provided `workspace` is a `RemoteWorkspace`. -- When `workspace` is remote, `persistence_dir` must be unset (`None`). - - - -### ConversationState: Validated Snapshot + Event Log - -Natural language invariants: - -- `ConversationState` is the **only** component intended to hold mutable execution status (`IDLE`, `RUNNING`, `WAITING_FOR_CONFIRMATION`, etc.). -- `ConversationState` owns persistence (`FileStore`) and the event store; all other components treat persistence as an implementation detail. - -### Confirmation Mode Predicate - -The SDK exposes a single predicate for confirmation mode: - -- Confirmation mode is active iff `state.security_analyzer != None` **and** the confirmation policy is not `NeverConfirm`. - -### ask_agent() Must Be Stateless - -Natural language invariant (from the public contract): - -- `BaseConversation.ask_agent(question)` **must not** append events, mutate execution status, or persist anything. It is safe to call concurrently with `run()`. - -### Secrets Persistence Requires a Cipher - -Natural language invariant: - -- If `ConversationState` is persisted without a cipher, secret values are redacted and **cannot be recovered on restore**. - -(Implication: use `Cipher` when persistence is enabled and you expect to resume with secrets intact.) - ## Component Relationships ### How Conversation Interacts @@ -5434,12 +5390,47 @@ flowchart LR - **[Event System](/sdk/arch/events)** - Event types and flow - **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples -#### AI Invariants (OCL-like) +## Invariants (Normative) + +### Conversation Factory: Workspace Chooses Implementation + +Natural language invariant: + +- `Conversation(...)` is a factory that returns `LocalConversation` unless the provided `workspace` is a `RemoteWorkspace`. +- When `workspace` is remote, `persistence_dir` must be unset (`None`). OCL-like (conceptual): - `context Conversation::__new__ pre RemoteNoPersistence: workspace.oclIsKindOf(RemoteWorkspace) implies persistence_dir = null` + +### ConversationState: Validated Snapshot + Event Log + +Natural language invariants: + +- `ConversationState` is the **only** component intended to hold mutable execution status (`IDLE`, `RUNNING`, `WAITING_FOR_CONFIRMATION`, etc.). +- `ConversationState` owns persistence (`FileStore`) and the event store; all other components treat persistence as an implementation detail. + +### Confirmation Mode Predicate + +The SDK exposes a single predicate for confirmation mode: + +- Confirmation mode is active iff `state.security_analyzer != None` **and** the confirmation policy is not `NeverConfirm`. + +### ask_agent() Must Be Stateless + +Natural language invariant (from the public contract): + +- `BaseConversation.ask_agent(question)` **must not** append events, mutate execution status, or persist anything. It is safe to call concurrently with `run()`. + +### Secrets Persistence Requires a Cipher + +Natural language invariant: + +- If `ConversationState` is persisted without a cipher, secret values are redacted and **cannot be recovered on restore**. + +(Implication: use `Cipher` when persistence is enabled and you expect to resume with secrets intact.) + ### Design Principles Source: https://docs.openhands.dev/sdk/arch/design.md @@ -5518,6 +5509,10 @@ The SDK is designed so that **all runtime state that affects agent execution is - **Configuration objects are immutable** (Pydantic `frozen=True` where applicable). - **The only intentionally mutable entity is `ConversationState`**, which owns the event log, execution status, secrets registry, and persistence handles. +OCL-like: + +- `context AgentBase inv StatelessConfiguration: self.model_config.frozen = true` +- `context Event inv Immutable: self.model_config.frozen = true` Natural language invariant: @@ -5530,6 +5525,9 @@ All side effects against the environment (filesystem, processes, git operations) - Tools may execute in different runtimes (local process vs inside agent-server), but *conceptually* they always operate against a workspace rooted at `workspace.working_dir`. +OCL-like: + +- `context BaseWorkspace inv WorkingDirIsString: self.working_dir.oclIsTypeOf(String)` ### Event Log is the Execution Trace @@ -5544,6 +5542,10 @@ Natural language invariant: The SDK assumes an explicit `Action -> Observation` pairing. +OCL-like (conceptual): + +- `context ActionEvent inv HasToolCallId: self.tool_call_id <> null` +- `context ObservationEvent inv RefersToAction: self.action_id <> null` Natural language invariant: @@ -5561,22 +5563,6 @@ The SDK makes *deployment mode* (local vs remote) a **runtime selection behind a This does **not** mean every optional method behaves identically across workspace types (e.g., `pause()` / `resume()` may be a no-op locally and meaningful remotely). The core conversation API (`send_message`, `run`, events) stays consistent.
-#### AI Invariants (OCL-like) - -OCL-like: - -- `context AgentBase inv StatelessConfiguration: self.model_config.frozen = true` -- `context Event inv Immutable: self.model_config.frozen = true` - -OCL-like: - -- `context BaseWorkspace inv WorkingDirIsString: self.working_dir.oclIsTypeOf(String)` - -OCL-like (conceptual): - -- `context ActionEvent inv HasToolCallId: self.tool_call_id <> null` -- `context ObservationEvent inv RefersToAction: self.action_id <> null` - ### Events Source: https://docs.openhands.dev/sdk/arch/events.md @@ -5720,51 +5706,6 @@ Events for metadata, control flow, and user actions (not sent to LLM): | **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | | **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | -## Invariants (Normative) - -### Event Immutability - -All events inherit from `Event` / `LLMConvertibleEvent` with Pydantic config `frozen=True` and `extra="forbid"`. - -Natural language invariant: - -- Once appended to the event log, an event must be treated as immutable. Mutations are represented as *new events*, not edits. - - - -### LLM-Convertible Stream Can Be Reconstructed Deterministically - -Natural language invariant: - -- `LLMConvertibleEvent.events_to_messages(events)` must produce the exact LLM message stream used for decision making, including batching of parallel tool calls. - -### Parallel Tool Calls are Batched by llm_response_id - -When multiple `ActionEvent`s share the same `llm_response_id`, they represent a single assistant turn with multiple tool calls. - -Natural language invariant: - -- In a batch, only the first `ActionEvent` may contain `thought`/reasoning; subsequent actions must have empty `thought`. This is asserted when combining events. - -### Condensation is a Pure View Transformation - -`Condensation.apply(events)` removes forgotten events and optionally inserts a synthetic `CondensationSummaryEvent` at `summary_offset`. - -Natural language invariants: - -- Condensation never mutates existing events; it returns a new list. -- `forgotten_event_ids` must refer to events that exist in the input list (otherwise the operation is a no-op for those IDs). -- If `summary` is present, `summary_offset` must also be present to insert the summary into the view; otherwise the summary is metadata only. - - - -For the condenser algorithms, thresholds, and configuration, see **[Condenser Architecture](/sdk/arch/condenser)**. - -**Source Types:** -- **user**: Event originated from user input -- **agent**: Event generated by agent logic -- **environment**: Event from system/framework/tools - ## Component Relationships ### How Events Integrate @@ -5839,16 +5780,57 @@ Two distinct error events exist in the SDK, with different purpose and visibilit - **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation - **[Condenser](/sdk/arch/condenser)** - Event history compression -#### AI Invariants (OCL-like) +## Invariants (Normative) -OCL-like: +### Event Immutability + +All events inherit from `Event` / `LLMConvertibleEvent` with Pydantic config `frozen=True` and `extra="forbid"`. + +Natural language invariant: + +- Once appended to the event log, an event must be treated as immutable. Mutations are represented as *new events*, not edits. + +OCL-like: - `context Event inv Frozen: self.model_config.frozen = true` + +### LLM-Convertible Stream Can Be Reconstructed Deterministically + +Natural language invariant: + +- `LLMConvertibleEvent.events_to_messages(events)` must produce the exact LLM message stream used for decision making, including batching of parallel tool calls. + +### Parallel Tool Calls are Batched by llm_response_id + +When multiple `ActionEvent`s share the same `llm_response_id`, they represent a single assistant turn with multiple tool calls. + +Natural language invariant: + +- In a batch, only the first `ActionEvent` may contain `thought`/reasoning; subsequent actions must have empty `thought`. This is asserted when combining events. + +### Condensation is a Pure View Transformation + +`Condensation.apply(events)` removes forgotten events and optionally inserts a synthetic `CondensationSummaryEvent` at `summary_offset`. + +Natural language invariants: + +- Condensation never mutates existing events; it returns a new list. +- `forgotten_event_ids` must refer to events that exist in the input list (otherwise the operation is a no-op for those IDs). +- If `summary` is present, `summary_offset` must also be present to insert the summary into the view; otherwise the summary is metadata only. + OCL-like (conceptual): - `context Condensation inv SummaryOffsetPair: (self.summary <> null) implies (self.summary_offset <> null) or true -- insertion requires both; metadata-only summary allowed` + +For the condenser algorithms, thresholds, and configuration, see **[Condenser Architecture](/sdk/arch/condenser)**. + +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools + ### LLM Source: https://docs.openhands.dev/sdk/arch/llm.md @@ -8388,88 +8370,6 @@ flowchart LR 2. **Resolver Lookup** - Registry finds the registered resolver for the tool name 3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state -## Invariants (Normative) - -### ToolDefinition Naming - -By default, tool names are derived from the class name: - -- `TerminalTool` → `terminal` -- `FileEditorTool` → `file_editor` - -Natural language invariant: - -- Unless explicitly overridden, `ToolDefinition.name` is deterministic and stable across runs. - -### Tool Registry - -`register_tool(name, factory)` maintains a global name→resolver mapping. - -Invariants: - -- Tool names must be non-empty strings. -- A `ToolDefinition` instance can only be registered if it has a non-None `executor`. -- A `ToolDefinition` subclass can only be registered if it implements a concrete `create(...)` classmethod that returns `Sequence[ToolDefinition]`. -- Resolving an unregistered tool name must raise `KeyError`. - - - -### Executor Presence and Call Semantics - -Natural language invariant: - -- A `ToolDefinition` without an `executor` is not executable; attempts to call it must fail fast. -- All tool execution is performed in a `LocalConversation` context (even when invoked remotely) because the agent-server hosts the actual conversation that runs tools. - -### Action/Observation Schemas are Validated - -Natural language invariant: - -- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and tool results are **parsed/validated** into the declared observation model (if present). If the executor already returns the correct observation type, this is a no-op. - - -1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) -2. **Resolver Lookup** - Registry finds the registered resolver for the tool name -3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state -4. **Instance Creation** - Tool instance(s) are created with configured executors -5. **Agent Usage** - Instances are added to the agent's tools_map for execution - -**Registration Types:** - -| Type | Registration | Resolver Behavior | -|------|-------------|-------------------| -| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | -| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | -| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | - -### File Organization - -Tools follow a consistent file structure for maintainability: - -``` -openhands-tools/openhands/tools/my_tool/ -├── __init__.py # Export MyTool -├── definition.py # Action, Observation, MyTool(ToolDefinition) -├── impl.py # MyExecutor(ToolExecutor) -└── [other modules] # Tool-specific utilities -``` - -**File Responsibilities:** - -| File | Contains | Purpose | -|------|----------|---------| -| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | -| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | -| `__init__.py` | Tool exports | Package interface | - -**Benefits:** -- **Separation of Concerns** - Public API separate from implementation -- **Avoid Circular Imports** - Import `impl` only inside `create()` method -- **Consistency** - All tools follow same structure for discoverability - -**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation - - ## MCP Integration The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. @@ -8685,12 +8585,90 @@ flowchart TB - **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools - **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library -#### AI Invariants (OCL-like) +## Invariants (Normative) + +### ToolDefinition Naming + +By default, tool names are derived from the class name: + +- `TerminalTool` → `terminal` +- `FileEditorTool` → `file_editor` + +Natural language invariant: + +- Unless explicitly overridden, `ToolDefinition.name` is deterministic and stable across runs. + +### Tool Registry + +`register_tool(name, factory)` maintains a global name→resolver mapping. + +Invariants: + +- Tool names must be non-empty strings. +- A `ToolDefinition` instance can only be registered if it has a non-None `executor`. +- A `ToolDefinition` subclass can only be registered if it implements a concrete `create(...)` classmethod that returns `Sequence[ToolDefinition]`. +- Resolving an unregistered tool name must raise `KeyError`. OCL-like (conceptual): - `context ToolRegistry inv NonEmptyNames: name.trim().size() > 0` + +### Executor Presence and Call Semantics + +Natural language invariant: + +- A `ToolDefinition` without an `executor` is not executable; attempts to call it must fail fast. +- All tool execution is performed in a `LocalConversation` context (even when invoked remotely) because the agent-server hosts the actual conversation that runs tools. + +### Action/Observation Schemas are Validated + +Natural language invariant: + +- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and tool results are **parsed/validated** into the declared observation model (if present). If the executor already returns the correct observation type, this is a no-op. + + +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution + +**Registration Types:** + +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | + +### File Organization + +Tools follow a consistent file structure for maintainability: + +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` + +**File Responsibilities:** + +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | + +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability + +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation + ### Workspace Source: https://docs.openhands.dev/sdk/arch/workspace.md @@ -8813,70 +8791,6 @@ flowchart LR | **timeout** | bool | Whether command timed out | | **duration** | float | Execution time in seconds | -## Invariants (Normative) - -### Workspace Factory: Host Chooses Remote - -The `Workspace(...)` constructor is a factory: - -- If `host` is provided, it returns a `RemoteWorkspace`. -- Otherwise it returns a `LocalWorkspace`. - - - -### BaseWorkspace Contract - -All workspace implementations must satisfy: - -- `execute_command(command, cwd, timeout)` returns a `CommandResult` where `exit_code=-1` indicates timeout. -- `file_upload` / `file_download` return a `FileOperationResult` with `success=false` and a populated `error` field on failure. -- Git helpers (`git_changes`, `git_diff`) must raise if the path is not a git repository. - -### working_dir Normalization - -Natural language invariant: - -- `working_dir` is normalized to a `str` even if passed as a `Path`. - -### Pause/Resume Semantics (Optional Capability) - -`pause()` / `resume()` are intentionally **optional capabilities**: - -- `LocalWorkspace.pause()` / `.resume()` are no-ops. -- Remote/container workspaces may implement pause/resume to conserve resources. -- If a workspace type does not support pausing, it must raise `NotImplementedError`. - -#### Discussion: `pause()` / `resume()` semantics (design tradeoff) - -There is an argument that this is compatible with the “swap workspaces without rewriting code” principle, because most client code should only rely on the *core* workspace and conversation operations, while optional capabilities are feature-detected or used conditionally. - -There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**. - -- Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op. -- Some remote/container workspaces may be able to pause a container or VM, but others may not. - -This tension matters because it creates two different reasonable expectations: - -1. *Ergonomic expectation*: orchestration code can call `pause()` unconditionally and it will be safe. -2. *Guarantee expectation*: calling `pause()` actually pauses resource usage. - -**Maybe it would make sense to** model this explicitly as an optional capability: - -- Add `supports_pause` (or a richer `pause_capability`) to `BaseWorkspace`, and -- Make `pause()` / `resume()` no-ops everywhere by default (including remote) while letting pausable implementations override, -- Keep a strict helper (e.g., `pause_or_raise()`) for callers who require a guarantee. - -This would make the default behavior unsurprising (safe to call), while still letting clients opt into fail-fast behavior when pausing is required. - - -### File Operations - -| Operation | Local Implementation | Remote Implementation | -|-----------|---------------------|----------------------| -| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | -| **Download** | `shutil.copy()` | `GET /file/download` stream | -| **Result** | `FileOperationResult` | `FileOperationResult` | - ## Resource Management Workspaces use **context manager** for safe resource handling: @@ -8953,12 +8867,73 @@ flowchart LR - **[Agent Server](/sdk/arch/agent-server)** - Remote execution API - **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution -#### AI Invariants (OCL-like) +## Invariants (Normative) + +### Workspace Factory: Host Chooses Remote + +The `Workspace(...)` constructor is a factory: + +- If `host` is provided, it returns a `RemoteWorkspace`. +- Otherwise it returns a `LocalWorkspace`. OCL-like (conceptual): - `context Workspace::__new__ post RemoteIffHost: (host <> null) implies result.oclIsKindOf(RemoteWorkspace)` + +### BaseWorkspace Contract + +All workspace implementations must satisfy: + +- `execute_command(command, cwd, timeout)` returns a `CommandResult` where `exit_code=-1` indicates timeout. +- `file_upload` / `file_download` return a `FileOperationResult` with `success=false` and a populated `error` field on failure. +- Git helpers (`git_changes`, `git_diff`) must raise if the path is not a git repository. + +### working_dir Normalization + +Natural language invariant: + +- `working_dir` is normalized to a `str` even if passed as a `Path`. + +### Pause/Resume Semantics (Optional Capability) + +`pause()` / `resume()` are intentionally **optional capabilities**: + +- `LocalWorkspace.pause()` / `.resume()` are no-ops. +- Remote/container workspaces may implement pause/resume to conserve resources. +- If a workspace type does not support pausing, it must raise `NotImplementedError`. + +#### Discussion: `pause()` / `resume()` semantics (design tradeoff) + +There is an argument that this is compatible with the “swap workspaces without rewriting code” principle, because most client code should only rely on the *core* workspace and conversation operations, while optional capabilities are feature-detected or used conditionally. + +There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**. + +- Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op. +- Some remote/container workspaces may be able to pause a container or VM, but others may not. + +This tension matters because it creates two different reasonable expectations: + +1. *Ergonomic expectation*: orchestration code can call `pause()` unconditionally and it will be safe. +2. *Guarantee expectation*: calling `pause()` actually pauses resource usage. + +**Maybe it would make sense to** model this explicitly as an optional capability: + +- Add `supports_pause` (or a richer `pause_capability`) to `BaseWorkspace`, and +- Make `pause()` / `resume()` no-ops everywhere by default (including remote) while letting pausable implementations override, +- Keep a strict helper (e.g., `pause_or_raise()`) for callers who require a guarantee. + +This would make the default behavior unsurprising (safe to call), while still letting clients opt into fail-fast behavior when pausing is required. + + +### File Operations + +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | + ### FAQ Source: https://docs.openhands.dev/sdk/faq.md diff --git a/scripts/generate-llms-files.py b/scripts/generate-llms-files.py index 95266fd4c..59436e123 100755 --- a/scripts/generate-llms-files.py +++ b/scripts/generate-llms-files.py @@ -51,10 +51,7 @@ BASE_URL = "https://docs.openhands.dev" EXCLUDED_DIRS = {".git", ".github", ".agents", "tests", "openapi", "logo"} -AI_INVARIANTS_RE = re.compile( - r"\{/\*\s*AI_INVARIANTS_BEGIN\s*\n(.*?)\n\s*AI_INVARIANTS_END\s*\*/\}", - re.DOTALL, -) +AI_INVARIANTS_SUFFIX = ".ai-invariants.md" @dataclass(frozen=True) @@ -64,7 +61,7 @@ class DocPage: title: str description: str | None body: str - ai_invariants: list[str] + ai_invariants: str | None _FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL) @@ -104,10 +101,13 @@ def parse_frontmatter(text: str) -> tuple[dict[str, str], str]: return fm, body -def extract_ai_invariants(body: str) -> tuple[str, list[str]]: - matches = [match.group(1).strip() for match in AI_INVARIANTS_RE.finditer(body)] - cleaned = AI_INVARIANTS_RE.sub("", body) - return cleaned, matches +def load_ai_invariants(rel_path: Path) -> str | None: + sidecar_path = rel_path.with_suffix(AI_INVARIANTS_SUFFIX) + full_path = ROOT / sidecar_path + if not full_path.exists(): + return None + content = full_path.read_text(encoding="utf-8").strip() + return content or None def rel_to_route(rel_path: Path) -> str: @@ -143,7 +143,7 @@ def iter_doc_pages() -> list[DocPage]: raw = mdx_path.read_text(encoding="utf-8") fm, body = parse_frontmatter(raw) - body, ai_invariants = extract_ai_invariants(body) + ai_invariants = load_ai_invariants(rel_path) title = fm.get("title") if not title: @@ -299,11 +299,8 @@ def build_llms_full_txt(pages: list[DocPage]) -> str: lines.append("") if page.ai_invariants: - lines.append("#### AI Invariants (OCL-like)") + lines.append(page.ai_invariants) lines.append("") - for block in page.ai_invariants: - lines.append(block) - lines.append("") return "\n".join(lines).rstrip() + "\n" diff --git a/sdk/arch/agent.ai-invariants.md b/sdk/arch/agent.ai-invariants.md new file mode 100644 index 000000000..12645b517 --- /dev/null +++ b/sdk/arch/agent.ai-invariants.md @@ -0,0 +1,73 @@ +## Invariants (Normative) + +### AgentBase: Configuration is Stateless and Immutable + +Natural language invariant: + +- An `AgentBase` instance is a **pure configuration object**. It may cache materialized `ToolDefinition` instances internally, but it must remain valid to re-create those tools from its declarative spec. + +OCL-like: + +- `context AgentBase inv Frozen: self.model_config.frozen = true` + + +### Initialization: System Prompt Precedes Any User Message + +`Agent.init_state(state, on_event=...)` is responsible for creating the initial system prompt event. + +Natural language invariant: + +- A `ConversationState` must not contain a user `MessageEvent` before it contains a `SystemPromptEvent`. + +OCL-like (conceptual): + +- `context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )` + + +### Tool Materialization: Names Resolve to Registered ToolDefinitions + +An `Agent` is configured with a list of tool *specs* (`openhands.sdk.tool.spec.Tool`) that reference registered `ToolDefinition` factories. + +Natural language invariant: + +- `resolve_tool(Tool(name=X))` must succeed (tool name present in registry) for all tools the agent intends to use. +- Tool factories must return a **sequence** of `ToolDefinition` instances; tool sets (e.g., browser tool sets) are represented as multi-element sequences. + +### Multi-Tool Calls: Shared Thought Only on First ActionEvent + +When an LLM returns parallel tool calls, the SDK represents this as multiple `ActionEvent`s that share the same `llm_response_id`. + +Natural language invariant: + +- For a batch of `ActionEvent`s with the same `llm_response_id`, only the first action carries `thought` / `reasoning_content` / `thinking_blocks`; subsequent actions must have empty `thought`. + +OCL-like (as modeled in `event.base._combine_action_events`): + +- `context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()` + + +### Confirmation Mode: Requires Both Analyzer and Policy + +`conversation.is_confirmation_mode_active` is true iff: + +- A `SecurityAnalyzer` is configured, and +- The confirmation policy is not `NeverConfirm`. + +OCL-like (conceptual): + +- `context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))` + + +**Execution Modes:** + +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | + +**Security Integration:** + +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation diff --git a/sdk/arch/agent.mdx b/sdk/arch/agent.mdx index 544c8451a..138ff48c8 100644 --- a/sdk/arch/agent.mdx +++ b/sdk/arch/agent.mdx @@ -224,84 +224,6 @@ flowchart TB style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -## Invariants (Normative) - -### AgentBase: Configuration is Stateless and Immutable - -Natural language invariant: - -- An `AgentBase` instance is a **pure configuration object**. It may cache materialized `ToolDefinition` instances internally, but it must remain valid to re-create those tools from its declarative spec. - -{/* AI_INVARIANTS_BEGIN -OCL-like: - -- `context AgentBase inv Frozen: self.model_config.frozen = true` -AI_INVARIANTS_END */} - -### Initialization: System Prompt Precedes Any User Message - -`Agent.init_state(state, on_event=...)` is responsible for creating the initial system prompt event. - -Natural language invariant: - -- A `ConversationState` must not contain a user `MessageEvent` before it contains a `SystemPromptEvent`. - -{/* AI_INVARIANTS_BEGIN -OCL-like (conceptual): - -- `context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )` -AI_INVARIANTS_END */} - -### Tool Materialization: Names Resolve to Registered ToolDefinitions - -An `Agent` is configured with a list of tool *specs* (`openhands.sdk.tool.spec.Tool`) that reference registered `ToolDefinition` factories. - -Natural language invariant: - -- `resolve_tool(Tool(name=X))` must succeed (tool name present in registry) for all tools the agent intends to use. -- Tool factories must return a **sequence** of `ToolDefinition` instances; tool sets (e.g., browser tool sets) are represented as multi-element sequences. - -### Multi-Tool Calls: Shared Thought Only on First ActionEvent - -When an LLM returns parallel tool calls, the SDK represents this as multiple `ActionEvent`s that share the same `llm_response_id`. - -Natural language invariant: - -- For a batch of `ActionEvent`s with the same `llm_response_id`, only the first action carries `thought` / `reasoning_content` / `thinking_blocks`; subsequent actions must have empty `thought`. - -{/* AI_INVARIANTS_BEGIN -OCL-like (as modeled in `event.base._combine_action_events`): - -- `context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()` -AI_INVARIANTS_END */} - -### Confirmation Mode: Requires Both Analyzer and Policy - -`conversation.is_confirmation_mode_active` is true iff: - -- A `SecurityAnalyzer` is configured, and -- The confirmation policy is not `NeverConfirm`. - -{/* AI_INVARIANTS_BEGIN -OCL-like (conceptual): - -- `context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))` -AI_INVARIANTS_END */} - -**Execution Modes:** - -| Mode | Behavior | Use Case | -|------|----------|----------| -| **Direct** | Execute immediately | Development, trusted environments | -| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | - -**Security Integration:** - -Before execution, the security analyzer evaluates each action: -- **Low Risk:** Execute immediately -- **Medium Risk:** Log warning, execute with monitoring -- **High Risk:** Block execution, request user confirmation - ## Component Relationships ### How Agent Interacts diff --git a/sdk/arch/conversation.ai-invariants.md b/sdk/arch/conversation.ai-invariants.md new file mode 100644 index 000000000..9e5139e54 --- /dev/null +++ b/sdk/arch/conversation.ai-invariants.md @@ -0,0 +1,40 @@ +## Invariants (Normative) + +### Conversation Factory: Workspace Chooses Implementation + +Natural language invariant: + +- `Conversation(...)` is a factory that returns `LocalConversation` unless the provided `workspace` is a `RemoteWorkspace`. +- When `workspace` is remote, `persistence_dir` must be unset (`None`). + +OCL-like (conceptual): + +- `context Conversation::__new__ pre RemoteNoPersistence: workspace.oclIsKindOf(RemoteWorkspace) implies persistence_dir = null` + + +### ConversationState: Validated Snapshot + Event Log + +Natural language invariants: + +- `ConversationState` is the **only** component intended to hold mutable execution status (`IDLE`, `RUNNING`, `WAITING_FOR_CONFIRMATION`, etc.). +- `ConversationState` owns persistence (`FileStore`) and the event store; all other components treat persistence as an implementation detail. + +### Confirmation Mode Predicate + +The SDK exposes a single predicate for confirmation mode: + +- Confirmation mode is active iff `state.security_analyzer != None` **and** the confirmation policy is not `NeverConfirm`. + +### ask_agent() Must Be Stateless + +Natural language invariant (from the public contract): + +- `BaseConversation.ask_agent(question)` **must not** append events, mutate execution status, or persist anything. It is safe to call concurrently with `run()`. + +### Secrets Persistence Requires a Cipher + +Natural language invariant: + +- If `ConversationState` is persisted without a cipher, secret values are redacted and **cannot be recovered on restore**. + +(Implication: use `Cipher` when persistence is enabled and you expect to resume with secrets intact.) diff --git a/sdk/arch/conversation.mdx b/sdk/arch/conversation.mdx index e13019a94..416af8a9a 100644 --- a/sdk/arch/conversation.mdx +++ b/sdk/arch/conversation.mdx @@ -190,48 +190,6 @@ The conversation system provides pluggable services that operate independently o **Design Principle:** Services read from the event log but never mutate state directly. -## Invariants (Normative) - -### Conversation Factory: Workspace Chooses Implementation - -Natural language invariant: - -- `Conversation(...)` is a factory that returns `LocalConversation` unless the provided `workspace` is a `RemoteWorkspace`. -- When `workspace` is remote, `persistence_dir` must be unset (`None`). - -{/* AI_INVARIANTS_BEGIN -OCL-like (conceptual): - -- `context Conversation::__new__ pre RemoteNoPersistence: workspace.oclIsKindOf(RemoteWorkspace) implies persistence_dir = null` -AI_INVARIANTS_END */} - -### ConversationState: Validated Snapshot + Event Log - -Natural language invariants: - -- `ConversationState` is the **only** component intended to hold mutable execution status (`IDLE`, `RUNNING`, `WAITING_FOR_CONFIRMATION`, etc.). -- `ConversationState` owns persistence (`FileStore`) and the event store; all other components treat persistence as an implementation detail. - -### Confirmation Mode Predicate - -The SDK exposes a single predicate for confirmation mode: - -- Confirmation mode is active iff `state.security_analyzer != None` **and** the confirmation policy is not `NeverConfirm`. - -### ask_agent() Must Be Stateless - -Natural language invariant (from the public contract): - -- `BaseConversation.ask_agent(question)` **must not** append events, mutate execution status, or persist anything. It is safe to call concurrently with `run()`. - -### Secrets Persistence Requires a Cipher - -Natural language invariant: - -- If `ConversationState` is persisted without a cipher, secret values are redacted and **cannot be recovered on restore**. - -(Implication: use `Cipher` when persistence is enabled and you expect to resume with secrets intact.) - ## Component Relationships ### How Conversation Interacts diff --git a/sdk/arch/design.ai-invariants.md b/sdk/arch/design.ai-invariants.md new file mode 100644 index 000000000..164a75d01 --- /dev/null +++ b/sdk/arch/design.ai-invariants.md @@ -0,0 +1,71 @@ +## Design Invariants (Normative) + +This page describes the **architectural invariants** the SDK relies on. These are treated as *contracts* between components. + +Where appropriate, we express invariants in a lightweight OCL-like notation: + +- `context X inv Name: ` +- `pre:` / `post:` for pre/post-conditions + +If an invariant cannot be expressed precisely in OCL without significant auxiliary modeling, we state it in precise natural language. + +### Single Source of Truth for Runtime State + +The SDK is designed so that **all runtime state that affects agent execution is representable as an event log plus a small, validated state snapshot**. + +- **Configuration objects are immutable** (Pydantic `frozen=True` where applicable). +- **The only intentionally mutable entity is `ConversationState`**, which owns the event log, execution status, secrets registry, and persistence handles. + +OCL-like: + +- `context AgentBase inv StatelessConfiguration: self.model_config.frozen = true` +- `context Event inv Immutable: self.model_config.frozen = true` + + +Natural language invariant: + +- `ConversationState` is the single coordination point for execution. Other objects may maintain private runtime caches, but **must not** be required to restore or replay a conversation. + +### Workspace Boundary is the I/O Boundary + +All side effects against the environment (filesystem, processes, git operations) must occur **through a Workspace** (local or remote), which becomes the **I/O boundary**. + +- Tools may execute in different runtimes (local process vs inside agent-server), but *conceptually* they always operate against a workspace rooted at `workspace.working_dir`. + +OCL-like: + +- `context BaseWorkspace inv WorkingDirIsString: self.working_dir.oclIsTypeOf(String)` + + +### Event Log is the Execution Trace + +The event stream is the single authoritative trace of what the agent *saw* and *did*. + +Natural language invariant: + +- Any agent decision that should be reproducible on replay must be representable as an `LLMConvertibleEvent` (for LLM context) plus associated non-LLM events (e.g., state updates, errors). + +### Tool Calls are Explicit, Typed, and Linkable + +The SDK assumes an explicit `Action -> Observation` pairing. + +OCL-like (conceptual): + +- `context ActionEvent inv HasToolCallId: self.tool_call_id <> null` +- `context ObservationEvent inv RefersToAction: self.action_id <> null` + + +Natural language invariant: + +- Observations must be attributable to a specific action/tool call so that conversations can be audited, visualized, and resumed. + +### Remote vs Local is an Execution Detail + +The SDK makes *deployment mode* (local vs remote) a **runtime selection behind a common interface**, not two separate programming models. + +- `Conversation(...)` returns either `LocalConversation` or `RemoteConversation` based on the provided workspace. +- User-facing code typically should not need to change when switching workspaces; you mostly swap configuration. + + +This does **not** mean every optional method behaves identically across workspace types (e.g., `pause()` / `resume()` may be a no-op locally and meaningful remotely). The core conversation API (`send_message`, `run`, events) stays consistent. + diff --git a/sdk/arch/design.mdx b/sdk/arch/design.mdx index 7a5af8e66..3fb3df930 100644 --- a/sdk/arch/design.mdx +++ b/sdk/arch/design.mdx @@ -60,78 +60,3 @@ Agents are defined as graphs of interchangeable components—tools, prompts, LLM Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. --- - -## Design Invariants (Normative) - -This page describes the **architectural invariants** the SDK relies on. These are treated as *contracts* between components. - -Where appropriate, we express invariants in a lightweight OCL-like notation: - -- `context X inv Name: ` -- `pre:` / `post:` for pre/post-conditions - -If an invariant cannot be expressed precisely in OCL without significant auxiliary modeling, we state it in precise natural language. - -### Single Source of Truth for Runtime State - -The SDK is designed so that **all runtime state that affects agent execution is representable as an event log plus a small, validated state snapshot**. - -- **Configuration objects are immutable** (Pydantic `frozen=True` where applicable). -- **The only intentionally mutable entity is `ConversationState`**, which owns the event log, execution status, secrets registry, and persistence handles. - -{/* AI_INVARIANTS_BEGIN -OCL-like: - -- `context AgentBase inv StatelessConfiguration: self.model_config.frozen = true` -- `context Event inv Immutable: self.model_config.frozen = true` -AI_INVARIANTS_END */} - -Natural language invariant: - -- `ConversationState` is the single coordination point for execution. Other objects may maintain private runtime caches, but **must not** be required to restore or replay a conversation. - -### Workspace Boundary is the I/O Boundary - -All side effects against the environment (filesystem, processes, git operations) must occur **through a Workspace** (local or remote), which becomes the **I/O boundary**. - -- Tools may execute in different runtimes (local process vs inside agent-server), but *conceptually* they always operate against a workspace rooted at `workspace.working_dir`. - -{/* AI_INVARIANTS_BEGIN -OCL-like: - -- `context BaseWorkspace inv WorkingDirIsString: self.working_dir.oclIsTypeOf(String)` -AI_INVARIANTS_END */} - -### Event Log is the Execution Trace - -The event stream is the single authoritative trace of what the agent *saw* and *did*. - -Natural language invariant: - -- Any agent decision that should be reproducible on replay must be representable as an `LLMConvertibleEvent` (for LLM context) plus associated non-LLM events (e.g., state updates, errors). - -### Tool Calls are Explicit, Typed, and Linkable - -The SDK assumes an explicit `Action -> Observation` pairing. - -{/* AI_INVARIANTS_BEGIN -OCL-like (conceptual): - -- `context ActionEvent inv HasToolCallId: self.tool_call_id <> null` -- `context ObservationEvent inv RefersToAction: self.action_id <> null` -AI_INVARIANTS_END */} - -Natural language invariant: - -- Observations must be attributable to a specific action/tool call so that conversations can be audited, visualized, and resumed. - -### Remote vs Local is an Execution Detail - -The SDK makes *deployment mode* (local vs remote) a **runtime selection behind a common interface**, not two separate programming models. - -- `Conversation(...)` returns either `LocalConversation` or `RemoteConversation` based on the provided workspace. -- User-facing code typically should not need to change when switching workspaces; you mostly swap configuration. - - -This does **not** mean every optional method behaves identically across workspace types (e.g., `pause()` / `resume()` may be a no-op locally and meaningful remotely). The core conversation API (`send_message`, `run`, events) stays consistent. - diff --git a/sdk/arch/events.ai-invariants.md b/sdk/arch/events.ai-invariants.md new file mode 100644 index 000000000..0b18acbd8 --- /dev/null +++ b/sdk/arch/events.ai-invariants.md @@ -0,0 +1,50 @@ +## Invariants (Normative) + +### Event Immutability + +All events inherit from `Event` / `LLMConvertibleEvent` with Pydantic config `frozen=True` and `extra="forbid"`. + +Natural language invariant: + +- Once appended to the event log, an event must be treated as immutable. Mutations are represented as *new events*, not edits. + +OCL-like: + +- `context Event inv Frozen: self.model_config.frozen = true` + + +### LLM-Convertible Stream Can Be Reconstructed Deterministically + +Natural language invariant: + +- `LLMConvertibleEvent.events_to_messages(events)` must produce the exact LLM message stream used for decision making, including batching of parallel tool calls. + +### Parallel Tool Calls are Batched by llm_response_id + +When multiple `ActionEvent`s share the same `llm_response_id`, they represent a single assistant turn with multiple tool calls. + +Natural language invariant: + +- In a batch, only the first `ActionEvent` may contain `thought`/reasoning; subsequent actions must have empty `thought`. This is asserted when combining events. + +### Condensation is a Pure View Transformation + +`Condensation.apply(events)` removes forgotten events and optionally inserts a synthetic `CondensationSummaryEvent` at `summary_offset`. + +Natural language invariants: + +- Condensation never mutates existing events; it returns a new list. +- `forgotten_event_ids` must refer to events that exist in the input list (otherwise the operation is a no-op for those IDs). +- If `summary` is present, `summary_offset` must also be present to insert the summary into the view; otherwise the summary is metadata only. + +OCL-like (conceptual): + +- `context Condensation inv SummaryOffsetPair: (self.summary <> null) implies (self.summary_offset <> null) or true -- insertion requires both; metadata-only summary allowed` + + +For the condenser algorithms, thresholds, and configuration, see **[Condenser Architecture](/sdk/arch/condenser)**. + +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools diff --git a/sdk/arch/events.mdx b/sdk/arch/events.mdx index 08ba16060..76ed5a590 100644 --- a/sdk/arch/events.mdx +++ b/sdk/arch/events.mdx @@ -143,59 +143,6 @@ Events for metadata, control flow, and user actions (not sent to LLM): | **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | | **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | -## Invariants (Normative) - -### Event Immutability - -All events inherit from `Event` / `LLMConvertibleEvent` with Pydantic config `frozen=True` and `extra="forbid"`. - -Natural language invariant: - -- Once appended to the event log, an event must be treated as immutable. Mutations are represented as *new events*, not edits. - -{/* AI_INVARIANTS_BEGIN -OCL-like: - -- `context Event inv Frozen: self.model_config.frozen = true` -AI_INVARIANTS_END */} - -### LLM-Convertible Stream Can Be Reconstructed Deterministically - -Natural language invariant: - -- `LLMConvertibleEvent.events_to_messages(events)` must produce the exact LLM message stream used for decision making, including batching of parallel tool calls. - -### Parallel Tool Calls are Batched by llm_response_id - -When multiple `ActionEvent`s share the same `llm_response_id`, they represent a single assistant turn with multiple tool calls. - -Natural language invariant: - -- In a batch, only the first `ActionEvent` may contain `thought`/reasoning; subsequent actions must have empty `thought`. This is asserted when combining events. - -### Condensation is a Pure View Transformation - -`Condensation.apply(events)` removes forgotten events and optionally inserts a synthetic `CondensationSummaryEvent` at `summary_offset`. - -Natural language invariants: - -- Condensation never mutates existing events; it returns a new list. -- `forgotten_event_ids` must refer to events that exist in the input list (otherwise the operation is a no-op for those IDs). -- If `summary` is present, `summary_offset` must also be present to insert the summary into the view; otherwise the summary is metadata only. - -{/* AI_INVARIANTS_BEGIN -OCL-like (conceptual): - -- `context Condensation inv SummaryOffsetPair: (self.summary <> null) implies (self.summary_offset <> null) or true -- insertion requires both; metadata-only summary allowed` -AI_INVARIANTS_END */} - -For the condenser algorithms, thresholds, and configuration, see **[Condenser Architecture](/sdk/arch/condenser)**. - -**Source Types:** -- **user**: Event originated from user input -- **agent**: Event generated by agent logic -- **environment**: Event from system/framework/tools - ## Component Relationships ### How Events Integrate diff --git a/sdk/arch/tool-system.ai-invariants.md b/sdk/arch/tool-system.ai-invariants.md new file mode 100644 index 000000000..141a0cf76 --- /dev/null +++ b/sdk/arch/tool-system.ai-invariants.md @@ -0,0 +1,83 @@ +## Invariants (Normative) + +### ToolDefinition Naming + +By default, tool names are derived from the class name: + +- `TerminalTool` → `terminal` +- `FileEditorTool` → `file_editor` + +Natural language invariant: + +- Unless explicitly overridden, `ToolDefinition.name` is deterministic and stable across runs. + +### Tool Registry + +`register_tool(name, factory)` maintains a global name→resolver mapping. + +Invariants: + +- Tool names must be non-empty strings. +- A `ToolDefinition` instance can only be registered if it has a non-None `executor`. +- A `ToolDefinition` subclass can only be registered if it implements a concrete `create(...)` classmethod that returns `Sequence[ToolDefinition]`. +- Resolving an unregistered tool name must raise `KeyError`. + +OCL-like (conceptual): + +- `context ToolRegistry inv NonEmptyNames: name.trim().size() > 0` + + +### Executor Presence and Call Semantics + +Natural language invariant: + +- A `ToolDefinition` without an `executor` is not executable; attempts to call it must fail fast. +- All tool execution is performed in a `LocalConversation` context (even when invoked remotely) because the agent-server hosts the actual conversation that runs tools. + +### Action/Observation Schemas are Validated + +Natural language invariant: + +- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and tool results are **parsed/validated** into the declared observation model (if present). If the executor already returns the correct observation type, this is a no-op. + + +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution + +**Registration Types:** + +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | + +### File Organization + +Tools follow a consistent file structure for maintainability: + +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` + +**File Responsibilities:** + +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | + +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability + +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation diff --git a/sdk/arch/tool-system.mdx b/sdk/arch/tool-system.mdx index 50580b988..a930ea95b 100644 --- a/sdk/arch/tool-system.mdx +++ b/sdk/arch/tool-system.mdx @@ -263,92 +263,6 @@ flowchart LR 2. **Resolver Lookup** - Registry finds the registered resolver for the tool name 3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state -## Invariants (Normative) - -### ToolDefinition Naming - -By default, tool names are derived from the class name: - -- `TerminalTool` → `terminal` -- `FileEditorTool` → `file_editor` - -Natural language invariant: - -- Unless explicitly overridden, `ToolDefinition.name` is deterministic and stable across runs. - -### Tool Registry - -`register_tool(name, factory)` maintains a global name→resolver mapping. - -Invariants: - -- Tool names must be non-empty strings. -- A `ToolDefinition` instance can only be registered if it has a non-None `executor`. -- A `ToolDefinition` subclass can only be registered if it implements a concrete `create(...)` classmethod that returns `Sequence[ToolDefinition]`. -- Resolving an unregistered tool name must raise `KeyError`. - -{/* AI_INVARIANTS_BEGIN -OCL-like (conceptual): - -- `context ToolRegistry inv NonEmptyNames: name.trim().size() > 0` -AI_INVARIANTS_END */} - -### Executor Presence and Call Semantics - -Natural language invariant: - -- A `ToolDefinition` without an `executor` is not executable; attempts to call it must fail fast. -- All tool execution is performed in a `LocalConversation` context (even when invoked remotely) because the agent-server hosts the actual conversation that runs tools. - -### Action/Observation Schemas are Validated - -Natural language invariant: - -- `Action` and `Observation` are Pydantic models; tool inputs are validated before execution, and tool results are **parsed/validated** into the declared observation model (if present). If the executor already returns the correct observation type, this is a no-op. - - -1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) -2. **Resolver Lookup** - Registry finds the registered resolver for the tool name -3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state -4. **Instance Creation** - Tool instance(s) are created with configured executors -5. **Agent Usage** - Instances are added to the agent's tools_map for execution - -**Registration Types:** - -| Type | Registration | Resolver Behavior | -|------|-------------|-------------------| -| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | -| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | -| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | - -### File Organization - -Tools follow a consistent file structure for maintainability: - -``` -openhands-tools/openhands/tools/my_tool/ -├── __init__.py # Export MyTool -├── definition.py # Action, Observation, MyTool(ToolDefinition) -├── impl.py # MyExecutor(ToolExecutor) -└── [other modules] # Tool-specific utilities -``` - -**File Responsibilities:** - -| File | Contains | Purpose | -|------|----------|---------| -| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | -| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | -| `__init__.py` | Tool exports | Package interface | - -**Benefits:** -- **Separation of Concerns** - Public API separate from implementation -- **Avoid Circular Imports** - Import `impl` only inside `create()` method -- **Consistency** - All tools follow same structure for discoverability - -**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation - - ## MCP Integration The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. diff --git a/sdk/arch/workspace.ai-invariants.md b/sdk/arch/workspace.ai-invariants.md new file mode 100644 index 000000000..385e345a9 --- /dev/null +++ b/sdk/arch/workspace.ai-invariants.md @@ -0,0 +1,66 @@ +## Invariants (Normative) + +### Workspace Factory: Host Chooses Remote + +The `Workspace(...)` constructor is a factory: + +- If `host` is provided, it returns a `RemoteWorkspace`. +- Otherwise it returns a `LocalWorkspace`. + +OCL-like (conceptual): + +- `context Workspace::__new__ post RemoteIffHost: (host <> null) implies result.oclIsKindOf(RemoteWorkspace)` + + +### BaseWorkspace Contract + +All workspace implementations must satisfy: + +- `execute_command(command, cwd, timeout)` returns a `CommandResult` where `exit_code=-1` indicates timeout. +- `file_upload` / `file_download` return a `FileOperationResult` with `success=false` and a populated `error` field on failure. +- Git helpers (`git_changes`, `git_diff`) must raise if the path is not a git repository. + +### working_dir Normalization + +Natural language invariant: + +- `working_dir` is normalized to a `str` even if passed as a `Path`. + +### Pause/Resume Semantics (Optional Capability) + +`pause()` / `resume()` are intentionally **optional capabilities**: + +- `LocalWorkspace.pause()` / `.resume()` are no-ops. +- Remote/container workspaces may implement pause/resume to conserve resources. +- If a workspace type does not support pausing, it must raise `NotImplementedError`. + +#### Discussion: `pause()` / `resume()` semantics (design tradeoff) + +There is an argument that this is compatible with the “swap workspaces without rewriting code” principle, because most client code should only rely on the *core* workspace and conversation operations, while optional capabilities are feature-detected or used conditionally. + +There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**. + +- Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op. +- Some remote/container workspaces may be able to pause a container or VM, but others may not. + +This tension matters because it creates two different reasonable expectations: + +1. *Ergonomic expectation*: orchestration code can call `pause()` unconditionally and it will be safe. +2. *Guarantee expectation*: calling `pause()` actually pauses resource usage. + +**Maybe it would make sense to** model this explicitly as an optional capability: + +- Add `supports_pause` (or a richer `pause_capability`) to `BaseWorkspace`, and +- Make `pause()` / `resume()` no-ops everywhere by default (including remote) while letting pausable implementations override, +- Keep a strict helper (e.g., `pause_or_raise()`) for callers who require a guarantee. + +This would make the default behavior unsurprising (safe to call), while still letting clients opt into fail-fast behavior when pausing is required. + + +### File Operations + +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | diff --git a/sdk/arch/workspace.mdx b/sdk/arch/workspace.mdx index 55daffd93..0ca8a0cab 100644 --- a/sdk/arch/workspace.mdx +++ b/sdk/arch/workspace.mdx @@ -122,74 +122,6 @@ flowchart LR | **timeout** | bool | Whether command timed out | | **duration** | float | Execution time in seconds | -## Invariants (Normative) - -### Workspace Factory: Host Chooses Remote - -The `Workspace(...)` constructor is a factory: - -- If `host` is provided, it returns a `RemoteWorkspace`. -- Otherwise it returns a `LocalWorkspace`. - -{/* AI_INVARIANTS_BEGIN -OCL-like (conceptual): - -- `context Workspace::__new__ post RemoteIffHost: (host <> null) implies result.oclIsKindOf(RemoteWorkspace)` -AI_INVARIANTS_END */} - -### BaseWorkspace Contract - -All workspace implementations must satisfy: - -- `execute_command(command, cwd, timeout)` returns a `CommandResult` where `exit_code=-1` indicates timeout. -- `file_upload` / `file_download` return a `FileOperationResult` with `success=false` and a populated `error` field on failure. -- Git helpers (`git_changes`, `git_diff`) must raise if the path is not a git repository. - -### working_dir Normalization - -Natural language invariant: - -- `working_dir` is normalized to a `str` even if passed as a `Path`. - -### Pause/Resume Semantics (Optional Capability) - -`pause()` / `resume()` are intentionally **optional capabilities**: - -- `LocalWorkspace.pause()` / `.resume()` are no-ops. -- Remote/container workspaces may implement pause/resume to conserve resources. -- If a workspace type does not support pausing, it must raise `NotImplementedError`. - -#### Discussion: `pause()` / `resume()` semantics (design tradeoff) - -There is an argument that this is compatible with the “swap workspaces without rewriting code” principle, because most client code should only rely on the *core* workspace and conversation operations, while optional capabilities are feature-detected or used conditionally. - -There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**. - -- Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op. -- Some remote/container workspaces may be able to pause a container or VM, but others may not. - -This tension matters because it creates two different reasonable expectations: - -1. *Ergonomic expectation*: orchestration code can call `pause()` unconditionally and it will be safe. -2. *Guarantee expectation*: calling `pause()` actually pauses resource usage. - -**Maybe it would make sense to** model this explicitly as an optional capability: - -- Add `supports_pause` (or a richer `pause_capability`) to `BaseWorkspace`, and -- Make `pause()` / `resume()` no-ops everywhere by default (including remote) while letting pausable implementations override, -- Keep a strict helper (e.g., `pause_or_raise()`) for callers who require a guarantee. - -This would make the default behavior unsurprising (safe to call), while still letting clients opt into fail-fast behavior when pausing is required. - - -### File Operations - -| Operation | Local Implementation | Remote Implementation | -|-----------|---------------------|----------------------| -| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | -| **Download** | `shutil.copy()` | `GET /file/download` stream | -| **Result** | `FileOperationResult` | `FileOperationResult` | - ## Resource Management Workspaces use **context manager** for safe resource handling: From 8d15b05cd78107354a3fe9d0b69d13e12cd2cf17 Mon Sep 17 00:00:00 2001 From: openhands Date: Sun, 8 Mar 2026 13:03:48 +0000 Subject: [PATCH 09/10] docs: restore human sections and llms wording Co-authored-by: openhands --- llms-full.txt | 78 +++++++++++++++++++++++++++++++++- llms.txt | 2 +- scripts/generate-llms-files.py | 2 +- sdk/arch/agent.mdx | 15 +++++++ sdk/arch/conversation.mdx | 6 ++- sdk/arch/events.mdx | 8 ++++ sdk/arch/tool-system.mdx | 40 ++++++++++++++++- sdk/arch/workspace.mdx | 9 ++++ 8 files changed, 154 insertions(+), 6 deletions(-) diff --git a/llms-full.txt b/llms-full.txt index 2492edc5a..4c1c4a52b 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -4117,6 +4117,21 @@ flowchart TB style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` +**Execution Modes:** + +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | + +**Security Integration:** + +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation + + ## Component Relationships ### How Agent Interacts @@ -5352,7 +5367,11 @@ The conversation system provides pluggable services that operate independently o | **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | | **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | -**Design Principle:** Services read from the event log but never mutate state directly. +**Design Principle:** Services read from the event log but never mutate state directly. This enables: +- Services can be enabled/disabled independently +- Easy to add new services without changing core orchestration +- Event stream acts as the integration point + ## Component Relationships @@ -5705,6 +5724,14 @@ Events for metadata, control flow, and user actions (not sent to LLM): | **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | | **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | | **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | +| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | + +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools + + ## Component Relationships @@ -8366,9 +8393,47 @@ flowchart LR **Resolution Workflow:** -1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "TerminalTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) 2. **Resolver Lookup** - Registry finds the registered resolver for the tool name 3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution + +**Registration Types:** + +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | + +### File Organization + +Tools follow a consistent file structure for maintainability: + +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` + +**File Responsibilities:** + +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | + +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability + +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation + ## MCP Integration @@ -8791,6 +8856,15 @@ flowchart LR | **timeout** | bool | Whether command timed out | | **duration** | float | Execution time in seconds | +### File Operations + +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | + + ## Resource Management Workspaces use **context manager** for safe resource handling: diff --git a/llms.txt b/llms.txt index dbd863d2c..849a69c4b 100644 --- a/llms.txt +++ b/llms.txt @@ -2,7 +2,7 @@ > LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded. -The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI) +The sections below intentionally separate OpenHands applications documentation (Web App Server / Cloud / CLI) from the OpenHands Software Agent SDK. ## OpenHands Software Agent SDK diff --git a/scripts/generate-llms-files.py b/scripts/generate-llms-files.py index 59436e123..4a02797f0 100755 --- a/scripts/generate-llms-files.py +++ b/scripts/generate-llms-files.py @@ -241,7 +241,7 @@ def build_llms_txt(pages: list[DocPage]) -> str: "", "> LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded.", "", - "The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI)", + "The sections below intentionally separate OpenHands applications documentation (Web App Server / Cloud / CLI)", "from the OpenHands Software Agent SDK.", "", ] diff --git a/sdk/arch/agent.mdx b/sdk/arch/agent.mdx index 138ff48c8..94ce23a9c 100644 --- a/sdk/arch/agent.mdx +++ b/sdk/arch/agent.mdx @@ -224,6 +224,21 @@ flowchart TB style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` +**Execution Modes:** + +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | + +**Security Integration:** + +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation + + ## Component Relationships ### How Agent Interacts diff --git a/sdk/arch/conversation.mdx b/sdk/arch/conversation.mdx index 416af8a9a..77bbfbe30 100644 --- a/sdk/arch/conversation.mdx +++ b/sdk/arch/conversation.mdx @@ -188,7 +188,11 @@ The conversation system provides pluggable services that operate independently o | **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | | **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | -**Design Principle:** Services read from the event log but never mutate state directly. +**Design Principle:** Services read from the event log but never mutate state directly. This enables: +- Services can be enabled/disabled independently +- Easy to add new services without changing core orchestration +- Event stream acts as the integration point + ## Component Relationships diff --git a/sdk/arch/events.mdx b/sdk/arch/events.mdx index 76ed5a590..135d43392 100644 --- a/sdk/arch/events.mdx +++ b/sdk/arch/events.mdx @@ -142,6 +142,14 @@ Events for metadata, control flow, and user actions (not sent to LLM): | **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | | **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | | **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | +| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | + +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools + + ## Component Relationships diff --git a/sdk/arch/tool-system.mdx b/sdk/arch/tool-system.mdx index a930ea95b..114a89318 100644 --- a/sdk/arch/tool-system.mdx +++ b/sdk/arch/tool-system.mdx @@ -259,9 +259,47 @@ flowchart LR **Resolution Workflow:** -1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "TerminalTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) 2. **Resolver Lookup** - Registry finds the registered resolver for the tool name 3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution + +**Registration Types:** + +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | + +### File Organization + +Tools follow a consistent file structure for maintainability: + +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` + +**File Responsibilities:** + +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | + +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability + +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation + ## MCP Integration diff --git a/sdk/arch/workspace.mdx b/sdk/arch/workspace.mdx index 0ca8a0cab..74c93fd9c 100644 --- a/sdk/arch/workspace.mdx +++ b/sdk/arch/workspace.mdx @@ -122,6 +122,15 @@ flowchart LR | **timeout** | bool | Whether command timed out | | **duration** | float | Execution time in seconds | +### File Operations + +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | + + ## Resource Management Workspaces use **context manager** for safe resource handling: From 874ed7bbf6f1d76f223f5aee9d5db8b0770ac507 Mon Sep 17 00:00:00 2001 From: Engel Nyst Date: Sun, 8 Mar 2026 17:21:01 +0100 Subject: [PATCH 10/10] Apply suggestion from @enyst --- sdk/arch/events.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/sdk/arch/events.mdx b/sdk/arch/events.mdx index 135d43392..108275d5f 100644 --- a/sdk/arch/events.mdx +++ b/sdk/arch/events.mdx @@ -149,8 +149,6 @@ Events for metadata, control flow, and user actions (not sent to LLM): - **agent**: Event generated by agent logic - **environment**: Event from system/framework/tools - - ## Component Relationships ### How Events Integrate