feat(0.12.0): production trace-sink — close the data-leak#18
Merged
Conversation
Every production chat session has been emitting zero replayable trace data.
Eval runs capture everything; production captures nothing. RL training,
research analyses, and the self-improvement loop all run on synthetic
personas. This primitive turns every real user conversation into data the
downstream channels (Prime Intellect, GEPA, research, canaries, analyst
loop) can consume.
`createProductionTraceSink(opts)` returns:
- `traceStore` — the in-memory store the agent's TraceEmitter writes to
during a chat session (built on agent-eval's existing InMemoryTraceStore;
no reinvention)
- `onRunComplete` — RunCompleteHook the agent registers; on endRun
composes a canonical ProductionRunRecord, persists to a durable store,
and POSTs the run as OTLP to a configured collector (Langfuse, etc.)
- `recordFeedback(input)` — appends a FeedbackLabel to the run's
FeedbackTrajectory; creates the trajectory anchored to runId on
first feedback
Wiring is ~10 lines in each agent's production chat handler:
const sink = createProductionTraceSink({
projectId: 'tax-agent',
otlp: { endpoint: env.LANGFUSE_OTEL_ENDPOINT, authHeader: env.LANGFUSE_OTEL_AUTH },
runRecordStore: drizzleRunRecordStore(db),
feedbackStore: drizzleFeedbackStore(db),
})
const emitter = new TraceEmitter(sink.traceStore, {
onRunComplete: [sink.onRunComplete],
})
Fail-loud everywhere it matters; fail-quiet only at the IO boundary:
- runRecordStore failures → logged, not thrown (chat handler stays up)
- OTLP POST failures (network/non-2xx) → logged, not thrown
- feedbackStore failures → null returned, logged
13 new tests in `tests/production-trace-sink.test.ts` cover:
- RunRecord composition for completed / failed / aborted
- failureClass + notes propagation
- runRecordStore throwing (hook stays alive)
- OTLP POST shape (service.name in resource attrs, authorization header)
- OTLP failure modes (network throw, non-2xx)
- omitted otlp / omitted authHeader paths
- recordFeedback create-then-append semantics
- explicit trajectoryId honour
- explicit trajectoryId honored
144/144 pass. Cloudflare Worker semantics intended: `ctx.waitUntil` the
hook from the chat handler so the worker stays alive long enough for
the OTLP POST + DB write to flush.
Bumps agent-runtime to 0.12.0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every production chat session has been emitting zero replayable trace data. Eval runs capture everything; production captures nothing. RL training, research analyses, and the self-improvement loop all run on synthetic personas only. This primitive turns every real user conversation into data the downstream channels (Prime Intellect, GEPA, research, canaries, analyst loop) can consume.
API
`createProductionTraceSink(opts)` returns:
Per-agent wiring (~10 lines)
```ts
const sink = createProductionTraceSink({
projectId: 'tax-agent',
otlp: { endpoint: env.LANGFUSE_OTEL_ENDPOINT, authHeader: env.LANGFUSE_OTEL_AUTH },
runRecordStore: drizzleRunRecordStore(db),
feedbackStore: drizzleFeedbackStore(db),
})
const emitter = new TraceEmitter(sink.traceStore, {
onRunComplete: [sink.onRunComplete],
})
await emitter.startRun({ scenarioId: sessionId, projectId: 'tax-agent', layer: 'app-runtime' })
// ... existing chat flow ...
await emitter.endRun({ pass, score })
```
CF Worker semantics: `ctx.waitUntil` the hook from the chat handler.
Fail-loud where it matters
Test plan
Next steps (separate PRs per agent)
Each of tax/legal/gtm/creative wires this into `packages/api-worker/src/services/agent-runtime/chat.ts` + adds the matching `runRecordStore`/`feedbackStore` Drizzle adapters + Langfuse env vars.