feat(0.12.0): production trace-sink — close the data-leak by drewstone · Pull Request #18 · tangle-network/agent-runtime

drewstone · 2026-05-20T12:08:20Z

Summary

Every production chat session has been emitting zero replayable trace data. Eval runs capture everything; production captures nothing. RL training, research analyses, and the self-improvement loop all run on synthetic personas only. This primitive turns every real user conversation into data the downstream channels (Prime Intellect, GEPA, research, canaries, analyst loop) can consume.

API

`createProductionTraceSink(opts)` returns:

`traceStore` — the in-memory store the agent's `TraceEmitter` writes to during a chat session (built on agent-eval's existing `InMemoryTraceStore`; no reinvention)
`onRunComplete` — `RunCompleteHook` the agent registers; on `endRun` composes a canonical `ProductionRunRecord`, persists to a durable store, and POSTs the run as OTLP to a configured collector (Langfuse, etc.)
`recordFeedback(input)` — appends a `FeedbackLabel` to the run's `FeedbackTrajectory`; creates the trajectory anchored to `runId` on first feedback

Per-agent wiring (~10 lines)

```ts
const sink = createProductionTraceSink({
projectId: 'tax-agent',
otlp: { endpoint: env.LANGFUSE_OTEL_ENDPOINT, authHeader: env.LANGFUSE_OTEL_AUTH },
runRecordStore: drizzleRunRecordStore(db),
feedbackStore: drizzleFeedbackStore(db),
})

const emitter = new TraceEmitter(sink.traceStore, {
onRunComplete: [sink.onRunComplete],
})
await emitter.startRun({ scenarioId: sessionId, projectId: 'tax-agent', layer: 'app-runtime' })
// ... existing chat flow ...
await emitter.endRun({ pass, score })
```

CF Worker semantics: `ctx.waitUntil` the hook from the chat handler.

Fail-loud where it matters

runRecordStore failures → logged, not thrown
OTLP POST failures (network / non-2xx) → logged, not thrown
feedbackStore failures → `null` returned, logged

Test plan

`pnpm test` — 144/144 pass (13 new under `tests/production-trace-sink.test.ts`)
`pnpm typecheck`
Bumps to 0.12.0

Next steps (separate PRs per agent)

Each of tax/legal/gtm/creative wires this into `packages/api-worker/src/services/agent-runtime/chat.ts` + adds the matching `runRecordStore`/`feedbackStore` Drizzle adapters + Langfuse env vars.

Every production chat session has been emitting zero replayable trace data. Eval runs capture everything; production captures nothing. RL training, research analyses, and the self-improvement loop all run on synthetic personas. This primitive turns every real user conversation into data the downstream channels (Prime Intellect, GEPA, research, canaries, analyst loop) can consume. `createProductionTraceSink(opts)` returns: - `traceStore` — the in-memory store the agent's TraceEmitter writes to during a chat session (built on agent-eval's existing InMemoryTraceStore; no reinvention) - `onRunComplete` — RunCompleteHook the agent registers; on endRun composes a canonical ProductionRunRecord, persists to a durable store, and POSTs the run as OTLP to a configured collector (Langfuse, etc.) - `recordFeedback(input)` — appends a FeedbackLabel to the run's FeedbackTrajectory; creates the trajectory anchored to runId on first feedback Wiring is ~10 lines in each agent's production chat handler: const sink = createProductionTraceSink({ projectId: 'tax-agent', otlp: { endpoint: env.LANGFUSE_OTEL_ENDPOINT, authHeader: env.LANGFUSE_OTEL_AUTH }, runRecordStore: drizzleRunRecordStore(db), feedbackStore: drizzleFeedbackStore(db), }) const emitter = new TraceEmitter(sink.traceStore, { onRunComplete: [sink.onRunComplete], }) Fail-loud everywhere it matters; fail-quiet only at the IO boundary: - runRecordStore failures → logged, not thrown (chat handler stays up) - OTLP POST failures (network/non-2xx) → logged, not thrown - feedbackStore failures → null returned, logged 13 new tests in `tests/production-trace-sink.test.ts` cover: - RunRecord composition for completed / failed / aborted - failureClass + notes propagation - runRecordStore throwing (hook stays alive) - OTLP POST shape (service.name in resource attrs, authorization header) - OTLP failure modes (network throw, non-2xx) - omitted otlp / omitted authHeader paths - recordFeedback create-then-append semantics - explicit trajectoryId honour - explicit trajectoryId honored 144/144 pass. Cloudflare Worker semantics intended: `ctx.waitUntil` the hook from the chat handler so the worker stays alive long enough for the OTLP POST + DB write to flush. Bumps agent-runtime to 0.12.0.

drewstone added 2 commits May 20, 2026 15:07

style: biome formatter

e912096

tangletools merged commit 0ed1406 into main May 20, 2026
1 check passed

tangletools deleted the feat/production-trace-sink branch May 20, 2026 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.12.0): production trace-sink — close the data-leak#18

feat(0.12.0): production trace-sink — close the data-leak#18
tangletools merged 2 commits into
mainfrom
feat/production-trace-sink

drewstone commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented May 20, 2026

Summary

API

Per-agent wiring (~10 lines)

Fail-loud where it matters

Test plan

Next steps (separate PRs per agent)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants