feat(otel): honor capture_content + redact on span attributes (closes #130)#154
Open
initializ-mk wants to merge 2 commits into
Open
feat(otel): honor capture_content + redact on span attributes (closes #130)#154initializ-mk wants to merge 2 commits into
initializ-mk wants to merge 2 commits into
Conversation
…130) Phase 3 of the OTel Tracing v1 initiative (#108, PR #125) shipped span instrumentation across the executor loop and tool calls but kept it metadata-only — span attributes carried provider, model, usage tokens, finish reasons, but no prompt / completion / tool I/O text. Phase 2 (#103, PR #124) plumbed two operator-facing knobs (`capture_content`, `redact`) through the config schema. The runtime never read them. An operator who set `capture_content: true` got metadata-only spans and no error — the worst kind of config: load- bearing-looking, silently inert. This commit closes that gap. What lands 1. forge-core/runtime/content_redact.go — new package-internal helpers: - RedactSecrets scrubs known vendor secret-token shapes (Anthropic sk-ant, OpenAI sk-, GitHub ghp_/gho_/ghs_/github_pat_, AWS AKIA, Slack xoxb/xoxp, RSA/EC/OPENSSH/PRIVATE key blocks, Telegram bot tokens). Patterns mirror the runtime guardrails CustomRule defaults in forge-cli/runtime/guardrails_loader.go's DefaultStructuredGuardrails — the two should evolve together. - PrepareSpanContent runs the redact-then-truncate pipeline for content destined for OTel span attributes. Cap defaults to 4 KiB (below the 5 KiB soft attribute-length limit most backends apply). Reuses the audit pipeline's TruncateForAudit so the `…[truncated:N]` marker is byte-identical to what AuditPayloadCapture emits for the same input. 2. forge-core/observability/attrs.go — four new attribute constants: - AttrGenAIPrompt = "gen_ai.prompt" // OTel GenAI semconv - AttrGenAICompletion = "gen_ai.completion" // OTel GenAI semconv - AttrForgeToolArgs = "forge.tool.args" - AttrForgeToolResult = "forge.tool.result" Stripped the "Phase 3 metadata-only" callout from the forge.tool.* group. 3. forge-core/runtime/loop.go — adds: - LLMExecutorConfig.TracingConfig (consumed by Phase 3 sites) - LLMExecutor.tracingCfg field - Conditional attribute stamping on the llm.completion span (`gen_ai.prompt` before Chat(), `gen_ai.completion` after success) and the tool.<name> span (`forge.tool.args` before Execute(), `forge.tool.result` after). 4. forge-cli/runtime/runner.go — populates LLMExecutorConfig. TracingConfig from the already-resolved tracingCfg the cli also passes to NewTracerProvider. Zero plumbing additions; just wires the existing field through. Cross-pipeline parity The four content attributes pass through the same redact-then- truncate helper as the (existing) audit payload-capture path. An operator who sees a `[REDACTED]` marker in an audit row sees the same marker on the linked span; the same goes for `…[truncated:N]`. Vendor pattern parity with the guardrails defaults is enforced by convention (and called out in the doc updates). Default posture preserved CaptureContent=false (the zero-value default) means the four content attributes are absent from spans — not set to empty string. Backends that gate dashboards on "is this key present?" can distinguish "metadata-only by default" from "operator opted in but the field happened to be empty." Empty content (e.g. tool-call-only assistant turn → no completion text) likewise skips stamping. Tests - 11 unit tests in content_redact_test.go cover RedactSecrets per vendor pattern, PrepareSpanContent ordering invariant (redact before truncate so a secret straddling the cap boundary can't survive), and the cross-pipeline truncation-marker parity. - 8 integration tests in loop_spans_content_test.go cover: - capture-true + redact-true: span carries redacted prompt - capture-true + redact-false: span carries raw prompt - capture-false: no prompt/completion/args/result attributes - large prompt: truncated with the same marker as audit - completion stamping on success - empty completion: attribute skipped - tool args + result on tool.<name> span (redacted) - tool args + result not present when capture-false Docs - docs/core-concepts/observability-tracing.md § Phase 3 is metadata- only → § Span content capture. New table mapping config knob to attribute keys per span. Notes the byte cap, the marker-parity-with-audit invariant, and the redact pattern set. Updated config example + field table. - docs/security/audit-logging.md § Trace cross-link gains a § Content-capture parity subsection explaining the redact + cap parity invariant and the divergent caps (16 KiB audit, 4 KiB span). - .claude/skills/forge.md § 12.9 — replaces the "Phase 3 ships metadata-only" caveat with a paragraph documenting the new capture surface. Updates the example forge.yaml comment. Verification - gofmt clean; golangci-lint 0 issues - full forge-core + forge-cli test suites green - the 19 new tests in this PR all pass
…essages)
The OTel GenAI semantic conventions moved the prompt + completion
attributes from flat-string (gen_ai.prompt, gen_ai.completion) to
structured (gen_ai.input.messages, gen_ai.output.messages) — arrays
of role+content message objects. For a feature landing in v0.15.0
we should ship the current keys, not the deprecated ones.
Changes
1. attrs.go — AttrGenAIPrompt → AttrGenAIInputMessages
(value: gen_ai.input.messages); AttrGenAICompletion →
AttrGenAIOutputMessages (value: gen_ai.output.messages). Doc
comments call out the supersedence.
2. loop.go — completion attribute now stamps a single-element
[{role,content}] array (via the existing serializeChatMessages
helper) instead of the raw response string, matching the
structured-shape contract the new key implies. The prompt path
already emitted a message array — only the key name changed.
3. Tests — TestExecute_CaptureContentTrue_StampsCompletionOnLLMSpan
now asserts the value is JSON-parseable as
[]llm.ChatMessage{{Role: assistant, Content: …}} instead of the
bare response string. Other tests still pass unchanged because
their assertions look for substring presence (the secret in
redact tests, the truncation marker, etc.) and the JSON wrapper
doesn't affect those.
4. Docs — observability-tracing.md attribute table updated with the
new keys and a note about backends that only recognize the
deprecated flat-string attributes (operators should upgrade the
backend's semconv mapping or use a span processor to translate).
.claude/skills/forge.md § 12.9 updated with the same note.
Verification
- gofmt + golangci-lint clean
- forge-core/runtime + forge-core/observability test suites pass
- the 8 integration tests still cover the same four logical sites
(LLM prompt, LLM completion, tool args, tool result) under the
new key names
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #130.
Summary
Phase 3 of the OTel Tracing v1 initiative (#108) shipped span instrumentation but metadata-only. The `capture_content` + `redact` knobs in `forge.yaml` were plumbed by Phase 2 but never consumed — an operator who set `capture_content: true` got metadata-only spans and no error.
This PR closes the gap. When `observability.tracing.capture_content: true` is set, the `llm.completion` and `tool.` spans stamp the prompt / completion / tool I/O as attributes, passed through a redact-then-truncate pipeline that mirrors what the audit payload-capture path will use.
Attribute keys added
Default posture (no opt-in) preserved: the keys are absent from spans — not empty string. Backends that look at "is the key present?" can distinguish "metadata-only by default" from "operator opted in but the field happened to be empty." Empty completion / args / result skips stamping for the same reason.
Redact + truncate pipeline
`PrepareSpanContent(s, redact, maxBytes)`:
Ordering matters: redact runs before truncate so a secret straddling the cap boundary can never survive in the truncated tail. Pinned by `TestPrepareSpanContent_RedactThenTruncate`.
Files
Test plan
Out of scope