Skip to content

docs(sdk): add normative invariants to architecture docs#274

Open
enyst wants to merge 15 commits intomainfrom
openhands/sdk-arch-1815
Open

docs(sdk): add normative invariants to architecture docs#274
enyst wants to merge 15 commits intomainfrom
openhands/sdk-arch-1815

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Jan 24, 2026

HUMAN: I’ve been thinking for a while that maybe it will help LLMs to

  • avoid weird behavior drift when they make PRs: if they have or can have a little better understanding of the relationships between components / classes / packages or intent behind them.
  • see stricter language: reinforce intent by constraints in a more formal language. Here, OCL, an invariant-expressions language complementing UML - probably forgotten by humans, but LLMs know it. (and I think maybe it's simple enough to read as English)

I asked OH to model the first few layers of the SDK (I think it only did the first in-package layer, will come back to that if we want this) and add constraints in both natural language and, if simple, in good old OCL.

  • I have read and reviewed the documentation changes to the best of my ability.
  • If the change is significant, I have run the documentation site locally and confirmed it renders as expected.

——

Summary of changes

  • Added "Invariants (Normative)" sections across SDK architecture docs (design, agent, conversation, events, tool system, workspace)

  • Included OCL-like invariants where simple and precise natural-language invariants where complex

  • Fixed/maintained MDX structure so tables/mermaid blocks render correctly

  • closes Analyze the SDK and model it software-agent-sdk#1815

@enyst can click here to continue refining the PR

…815\n\nCo-authored-by: openhands <openhands@all-hands.dev>
…Co-authored-by: openhands <openhands@all-hands.dev>
…\n\nCo-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst marked this pull request as ready for review January 24, 2026 14:20
@enyst enyst requested a review from xingyaoww as a code owner January 24, 2026 14:20
@openhands-ai
Copy link

openhands-ai bot commented Jan 30, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • .github/workflows/sync-agent-sdk-openapi.yml
    • .github/workflows/sync-docs-code-blocks.yml

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #274 at branch `openhands/sdk-arch-1815`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@enyst
Copy link
Collaborator Author

enyst commented Feb 18, 2026

@OpenHands The trouble with this PR is that it displays our OCL constraints in the user-facing docs, but they are meant for AI agents, not for humans.

I think we have two options:

  • either insert them as html comments or something that is invisible to the user, but accessible to LLMs when AI agents read the pages
  • or, maybe we can create / modify / fix llms.txt and the full variant of it, so that it includes those sections, but the user-facing docs doesn't. (look up on the net the llms.txt emerging standard to see what it should work like)

WDYT? Answer me directly on github, in a comment you post to this PR. Note that it will be rendered as markdown.

@openhands-ai
Copy link

openhands-ai bot commented Feb 18, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 18, 2026

I did not make any repository changes in this interaction. I answered the PR comment question by recommending how to handle the OCL constraints so they’re available to AI agents but not shown in user-facing docs:

  • Prefer option 1 as the immediate fix: keep the natural-language invariants visible, but move the OCL blocks into HTML comments (non-rendered) with clear sentinels like <!-- AI_INVARIANTS_BEGIN --> ... <!-- AI_INVARIANTS_END --> so agents can reliably extract them.
  • Treat llms.txt / llms-full.txt as a potential follow-up packaging step (useful long-term, but adds maintenance/drift considerations), not the primary solution.
  • I provided a ready-to-paste Markdown PR comment expressing that recommendation.

No git diff to review, no commits, and nothing to push.

@enyst
Copy link
Collaborator Author

enyst commented Mar 7, 2026

@OpenHands Merge main and fix conflicts. Then read all comments to this PR, and implement Option 1: keep the natural-language invariants visible, but move the OCL blocks into HTML comments (non-rendered) with clear sentinels like ... so agents can reliably extract them.

@openhands-ai
Copy link

openhands-ai bot commented Mar 7, 2026

I'm on it! enyst can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Mar 7, 2026

Summary of new changes since my last update:

  • Added an AGENTS.md note documenting the AI-only invariant comment convention (OCL blocks wrapped in AI_INVARIANTS_BEGIN/END HTML comments).

Checklist:

  • Request fully addressed (documented the convention for future edits).
  • Changes are concise (single AGENTS.md addition, no extraneous edits).
  • Changes pushed to the remote branch.

Note: I did not re-check git diff here because you asked for no tool calls.

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator Author

enyst commented Mar 8, 2026

I moved these invariants in llms.txt, instead of human-facing .mdx files.

The docs pages are for humans, but this information is just a restatement in more formal language intended for agents, not for humans.

@xingyaoww WDYT about this solution?

Copy link
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Clean pattern, but needs verification and refinement

Taste Rating: The sidecar file pattern is elegant and the separation of concerns is good. However, the value proposition is speculative and some content needs refinement.


Verification Status

I cloned software-agent-sdk and verified key invariants against source code:

Verified Correct:

  • AgentBase has frozen=True in model_config
  • Event has frozen=True and extra="forbid"
  • Conversation factory returns RemoteConversation when workspace is RemoteWorkspace
  • Tool naming: TerminalToolterminal (via _camel_to_snake().removesuffix("_tool"))
  • Workspace factory returns RemoteWorkspace when host is provided
  • Multi-action batching: _combine_action_events asserts empty thought for all but first action

The invariants I spot-checked are accurate. Good work on verification.


Key Issues

🟠 Important: Value Proposition Unclear

The PR description says this "maybe" will help LLMs avoid drift, but provides no concrete evidence:

  • No examples of drift this would prevent
  • No measurement or testing against actual LLM behavior
  • No comparison before/after

Recommendation: Either demonstrate measurable benefit (e.g., "this prevents X type of invalid PR") or frame this as an experiment to gather data.

🟡 Suggestion: Design Commentary vs Normative Constraints

Several invariant files include design discussions (especially workspace.ai-invariants.md pause/resume section). This is valuable context but blurs the line between "normative constraint" and "architectural commentary."

Consider:

  • Creating a separate section for "Design Discussions" or "Open Questions"
  • Keeping "Invariants (Normative)" strictly to verifiable contracts
  • Moving design rationale to the main .mdx files where humans can also benefit

🟢 Acceptable: Unrelated Formatting Changes

Some .mdx files have whitespace changes (trailing newlines, blank line cleanup) that are unrelated to the main PR goal. Not a blocker, but ideally these would be in a separate cleanup commit.

Comment on lines +37 to +56
#### Discussion: `pause()` / `resume()` semantics (design tradeoff)

There is an argument that this is compatible with the “swap workspaces without rewriting code” principle, because most client code should only rely on the *core* workspace and conversation operations, while optional capabilities are feature-detected or used conditionally.

There is a mild design smell here: the method names `pause()` / `resume()` suggest a strong guarantee (that work is actually suspended), but the SDK currently treats them as a **best-effort resource management hook**.

- Locally, there is often nothing meaningful the workspace can suspend at the boundary (it is operating on the host OS), so `LocalWorkspace.pause()` is a no-op.
- Some remote/container workspaces may be able to pause a container or VM, but others may not.

This tension matters because it creates two different reasonable expectations:

1. *Ergonomic expectation*: orchestration code can call `pause()` unconditionally and it will be safe.
2. *Guarantee expectation*: calling `pause()` actually pauses resource usage.

**Maybe it would make sense to** model this explicitly as an optional capability:

- Add `supports_pause` (or a richer `pause_capability`) to `BaseWorkspace`, and
- Make `pause()` / `resume()` no-ops everywhere by default (including remote) while letting pausable implementations override,
- Keep a strict helper (e.g., `pause_or_raise()`) for callers who require a guarantee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This discussion is valuable but doesn't belong in "Invariants (Normative)".

Recommendation: Move this design smell discussion to either:

  1. A "Design Rationale" section in workspace.mdx (where humans can see it)
  2. A separate workspace.design-discussion.md file
  3. Or at minimum, retitle this section "Design Discussion: pause/resume semantics"

Normative invariants should be contracts that can be verified/enforced, not open-ended design debates.

Comment on lines +44 to +83
1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`)
2. **Resolver Lookup** - Registry finds the registered resolver for the tool name
3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state
4. **Instance Creation** - Tool instance(s) are created with configured executors
5. **Agent Usage** - Instances are added to the agent's tools_map for execution

**Registration Types:**

| Type | Registration | Resolver Behavior |
|------|-------------|-------------------|
| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) |
| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` |
| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` |

### File Organization

Tools follow a consistent file structure for maintainability:

```
openhands-tools/openhands/tools/my_tool/
├── __init__.py # Export MyTool
├── definition.py # Action, Observation, MyTool(ToolDefinition)
├── impl.py # MyExecutor(ToolExecutor)
└── [other modules] # Tool-specific utilities
```

**File Responsibilities:**

| File | Contains | Purpose |
|------|----------|---------|
| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method |
| `impl.py` | ToolExecutor implementation | Business logic, state management, execution |
| `__init__.py` | Tool exports | Package interface |

**Benefits:**
- **Separation of Concerns** - Public API separate from implementation
- **Avoid Circular Imports** - Import `impl` only inside `create()` method
- **Consistency** - All tools follow same structure for discoverability

**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: This entire "File Organization" section describes project conventions, not runtime invariants.

It's useful content, but consider whether it belongs in:

  • The main tool-system.mdx page (for human readers)
  • A separate "SDK Conventions" doc
  • Or a different section title like "## Code Organization Patterns (Informative)"

Not a blocker, just a categorization question.

Copy link
Contributor

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern here is how do we keep this up-to-date :(

We have a lot of docs (for human) and sometimes it is already hard to make sure they are accurate and up-to-date..

Maybe we will need some CI that runs periodically that updates these?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Analyze the SDK and model it

4 participants