Skip to content

Add Hermes Agent session transcript support to AgentRolloutSeedSource #496

@eric-tramel

Description

@eric-tramel

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

The existing AgentRolloutSeedSource supports two vendor-specific formats (Claude Code and Codex). Hermes Agent stores session transcripts under ~/.hermes/sessions, and those session artifacts preserve the full agent loop with structured message roles, tool_calls, tool_call_id, and reasoning fields. That makes Hermes session transcripts a natural ingestion target for trace distillation, analysis, and training-data preparation without requiring a custom seed reader.

Describe the solution you'd like

Add a built-in Hermes Agent rollout format focused on session transcript ingestion rather than Hermes' ShareGPT trajectory export format.

The handler should support the session artifacts Hermes writes under ~/.hermes/sessions, including:

  • Gateway session transcripts stored as per-session .jsonl files
  • CLI session logs stored as session_*.json files containing top-level session metadata plus a messages array

The format should normalize both shapes into Data Designer's standard agent rollout schema, with Hermes-specific metadata stored in source_meta.

High level changes

  1. Add HERMES_AGENT to AgentRolloutFormat enum
  2. Implement HermesAgentRolloutFormatHandler
  3. Register the handler with BUILTIN_AGENT_ROLLOUT_FORMAT_HANDLERS
  4. Default the format path to ~/.hermes/sessions
  5. Parse Hermes gateway session_meta / tool-definition metadata into source_meta
  6. Normalize Hermes session messages into the standard messages payload used by AgentRolloutSeedSource

Example

from data_designer import DataDesigner, AgentRolloutSeedSource, AgentRolloutFormat

dd = DataDesigner()
config = dd.config_builder()
config.with_seed_dataset(
    AgentRolloutSeedSource(
        format=AgentRolloutFormat.HERMES_AGENT,
        path="~/.hermes/sessions",
    )
)

Describe alternatives you've considered

We could target Hermes' ShareGPT trajectory export format instead, but the session transcript format is a better fit for AgentRolloutSeedSource because it is closer to the existing Claude Code / Codex rollout handlers: one session per file, structured tool-call fields, and richer trace metadata.

Additional context

Hermes session storage currently has two related file conventions under ~/.hermes/sessions:

  • Gateway transcripts: *.jsonl
  • CLI session logs: session_*.json

A single Hermes handler can likely support both via file-shape auto-detection.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions