initializ · initializ-mk · Jun 11, 2026 · Jun 11, 2026
diff --git a/.claude/skills/forge.md b/.claude/skills/forge.md
@@ -660,10 +660,22 @@ Forge-specific attributes use the `forge.*` namespace
 (`forge.task.id`, `forge.task.final_state`, `forge.tool.name`,
 `forge.workflow.id`, ...).
 
-Phase 3 ships **metadata-only** spans. `capture_content` is plumbed
-through the config schema but not yet honored by the instrumentation;
-content capture is a follow-up that will reuse the FWS-8 audit
-redactor.
+**Default posture is metadata-only.** Prompts, completions, tool
+args, and tool results are NOT stamped on spans unless
+`observability.tracing.capture_content: true` is set (Phase 3.5 /
+#130). When opted-in: `llm.completion` gains `gen_ai.input.messages`
+(JSON array of role+content sent to the model) +
+`gen_ai.output.messages` (JSON single-element array for the response,
+current OTel GenAI semconv; supersedes the deprecated flat-string
+`gen_ai.prompt` / `gen_ai.completion`);
+`tool.<name>` gains `forge.tool.args` + `forge.tool.result`.
+Captured values pass through a redactor (vendor secret-token shapes:
+Anthropic / OpenAI / GitHub / AWS / Slack / private keys / Telegram)
+when `redact: true` (default with capture). Each value is byte-capped
+at 4 KiB with a `…[truncated:N]` marker byte-identical to the audit
+payload-capture marker, so an operator grepping `[truncated:` across
+spans and audit rows sees aligned output. `redact: false` is the
+enterprise raw-capture path.
 
 **Read**: `docs/core-concepts/observability-tracing.md`,
 `docs/reference/forge-yaml-schema.md` § `observability.tracing`,
@@ -790,8 +802,8 @@ observability:                       # OTel Tracing v1 (#108) — off by default
     service_name: ""                 # default: agent_id
     headers: { x-tenant: demo }
     resource_attrs: { deployment.environment: prod }
-    redact: true
-    capture_content: false           # Phase 3 ships metadata-only
+    redact: true                     # scrub vendor secrets when capture is on
+    capture_content: false           # off by default; opt in to span content
 
 skills:
   path: SKILL.md                     # main agent skill file

diff --git a/docs/core-concepts/observability-tracing.md b/docs/core-concepts/observability-tracing.md
@@ -45,8 +45,8 @@ observability:
       x-tenant: demo
     resource_attrs:                          # extra OTel resource attributes
       deployment.environment: prod
-    redact: true                             # default true — Phase 3 metadata-only ships now
-    capture_content: false                   # enterprise opt-in for prompt/completion content
+    redact: true                             # scrub vendor secret tokens when capture_content is on
+    capture_content: false                   # opt-in: stamp prompt/completion/tool I/O on spans
 ```
 
 | Field | Type | Default | Notes |
@@ -60,8 +60,8 @@ observability:
 | `service_name` | string | `agent_id` | `OTEL_SERVICE_NAME` env wins if set. |
 | `headers` | map | — | OTLP HTTP/gRPC headers. Env is the preferred path for secrets. |
 | `resource_attrs` | map | — | Merged with the auto-stamped `service.*` + `forge.runtime.version`. |
-| `redact` | bool | `true` | PII redaction posture flag (consumed by Phase 3+ instrumentation). |
-| `capture_content` | bool | `false` | Reserved — metadata-only spans ship now; content capture is a follow-up. |
+| `redact` | bool | `true` | When `capture_content: true`, scrub vendor secret tokens (Anthropic / OpenAI / GitHub / AWS / Slack / private keys / Telegram) before stamping content attributes. See [Span content capture](#span-content-capture). |
+| `capture_content` | bool | `false` | Stamp prompt / completion / tool I/O as span attributes. Off by default; metadata-only spans ship. See [Span content capture](#span-content-capture). |
 
 ## Config precedence
 
@@ -152,9 +152,24 @@ Forge mixes OTel GenAI semconv with Forge-specific `forge.*` namespaced attribut
 
 Tool errors do **not** fail the outer `agent.execute` span — they surface to the LLM as text and the loop continues. The tool span carries the failure detail so operators can pivot from a trace to the specific failed invocation.
 
-### Phase 3 is metadata-only
+### Span content capture
 
-Tool args / results, prompts, completions are **not** recorded as span attributes today. The `capture_content` + `redact` knobs are plumbed but not yet honored by the instrumentation — content capture is a follow-up that will reuse the FWS-8 audit redactor.
+Prompts, completions, tool args, and tool results are **off by default** — Phase 3 spans ship metadata only (provider, model, usage, finish reasons, tool name). Operators who need content attributes for in-trace debugging or supervised-learning corpora opt in via `observability.tracing.capture_content: true` (Phase 3.5 / issue #130).
+
+| `forge.yaml` knob | Span | Attribute keys added when `capture_content: true` |
+|---|---|---|
+| (always) | `llm.completion` | `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons` |
+| `capture_content: true` | `llm.completion` | `gen_ai.input.messages` (JSON array of role+content pairs sent to the model), `gen_ai.output.messages` (JSON single-element array of role+content for the model's response) — current OTel GenAI semconv, supersedes the deprecated flat-string `gen_ai.prompt` / `gen_ai.completion` |
+| (always) | `tool.<name>` | `forge.tool.name`, `forge.tool.error` (on failure) |
+| `capture_content: true` | `tool.<name>` | `forge.tool.args` (raw arguments JSON), `forge.tool.result` (raw output) |
+
+When `capture_content: true` and `redact: true` (the default when capture is on), attribute values pass through a redactor that scrubs the same vendor secret-token shapes the runtime guardrails default rules cover (Anthropic `sk-ant-…`, OpenAI `sk-…`, GitHub `ghp_/gho_/ghs_/github_pat_…`, AWS `AKIA…`, Slack `xoxb-/xoxp-…`, RSA/EC/OPENSSH/PRIVATE key blocks, Telegram bot tokens). Matched values become `[REDACTED]`. Setting `redact: false` is the enterprise raw-capture path — content is stamped verbatim with the byte cap still applied.
+
+Every captured value is byte-capped at **4 KiB** (below the 5 KiB attribute soft-cap most backends apply). When the input exceeds the cap, the value ends with a `…[truncated:N]` marker where `N` is the original byte length. The marker is **byte-identical** to what the audit payload-capture path emits for the same input, so an operator grepping `[truncated:` across span attributes and audit rows sees aligned output.
+
+**Default posture** (no opt-in): the `gen_ai.input.messages`, `gen_ai.output.messages`, `forge.tool.args`, `forge.tool.result` keys are **absent** from spans — not set to empty string. Backends that gate dashboards on "is this key present?" can distinguish "metadata-only by default" from "operator opted in but the field happened to be empty."
+
+**OTel semconv versioning note**: the GenAI semantic conventions moved from flat-string (`gen_ai.prompt`, `gen_ai.completion`) to structured (`gen_ai.input.messages`, `gen_ai.output.messages`) attributes. Forge emits only the **current** structured keys. Backends that only recognize the deprecated flat-string attributes will not show prompt / completion text on Forge spans — upgrade the backend's semconv mapping or use a span processor to translate.
 
 ## End-to-end propagation (Phase 5)
 

diff --git a/docs/security/audit-logging.md b/docs/security/audit-logging.md
@@ -330,6 +330,37 @@ shape — no `trace_id` / `span_id` keys appear. The
 `AuditSchemaVersion` is NOT bumped: adding optional fields is a
 schema-compatible change per the policy above.
 
+### Content-capture parity
+
+When `observability.tracing.capture_content: true` is set, prompt /
+completion / tool-args / tool-result content appears on **both** the
+linked OTel span and the FWS-8 audit row for the same logical event.
+The two pipelines run the captured content through the same redact-
+then-truncate helper (`runtime.PrepareSpanContent` /
+`runtime.TruncateForAudit`) so:
+
+- The redaction marker is identical (`[REDACTED]`) — operators
+  grepping either sink for vendor secret-token shapes see the same
+  match.
+- The truncation marker is byte-identical (`…[truncated:N]` where
+  `N` is the original byte length of the input). Grepping
+  `[truncated:` across audit rows and span attributes returns
+  aligned, comparable results.
+- The redact patterns mirror the runtime guardrails CustomRules
+  defaults (Anthropic / OpenAI / GitHub / AWS / Slack / private key
+  blocks / Telegram bot tokens). Adding a new vendor pattern to one
+  pipeline implies adding it to the other.
+
+The audit pipeline's byte cap (16 KiB per field, see
+`AuditPayloadCapture.Cap*Bytes`) is intentionally larger than the
+span cap (4 KiB — below the soft attribute-length limit most
+observability backends apply). The two caps are independent: a single
+event may be truncated on the span side and survive intact on the
+audit side. The trailing marker shape is the same either way.
+
+See [Observability — Span content capture](../core-concepts/observability-tracing.md#span-content-capture) for the
+span-side attribute keys and opt-in switches.
+
 ## Streams (FWS-9)
 
 `forge run` / `forge serve` use the OS streams as a stream-level

diff --git a/forge-cli/runtime/runner.go b/forge-cli/runtime/runner.go
@@ -834,6 +834,15 @@ func (r *Runner) Run(ctx context.Context) error {
 						MaxIterations: 100,
 						CharBudget:    charBudget,
 						FilesDir:      filepath.Join(r.cfg.WorkDir, ".forge", "files"),
+						// Issue #130 — the same resolved TracingConfig
+						// already passed to NewTracerProvider drives Phase
+						// 3.5 span-content capture inside the executor
+						// loop. Disabled state (Enabled=false +
+						// CaptureContent=false) is the zero-value default,
+						// so missing this on an older config schema is
+						// equivalent to "metadata-only spans" — the
+						// posture this initiative preserves.
+						TracingConfig: tracingCfg,
 					}
 					if r.derivedCLIConfig != nil {
 						execCfg.WorkflowPhases = r.derivedCLIConfig.WorkflowPhases

diff --git a/forge-core/observability/attrs.go b/forge-core/observability/attrs.go
@@ -88,14 +88,42 @@ const (
 	AttrForgeLoopIteration = "forge.loop.iteration"
 
 	// AttrForgeToolName / AttrForgeToolError name the tool call
-	// instrumentation. Tool args / results are NOT recorded here —
-	// Phase 3 is metadata-only. A future "capture_content=true with
-	// PII redaction" phase will add args/result attribute keys.
+	// instrumentation.
 	AttrForgeToolName  = "forge.tool.name"
 	AttrForgeToolError = "forge.tool.error"
 
 	// AttrForgeTaskFinalState is the terminal A2A TaskState the loop
 	// resolved to — "completed", "failed", "canceled". Set on the
 	// agent.execute span just before End.
 	AttrForgeTaskFinalState = "forge.task.final_state"
+
+	// ─── Content-capture attributes (Phase 3.5 / issue #130) ─────
+	//
+	// These attributes are set only when TracingConfig.CaptureContent
+	// is true. The default posture remains metadata-only: an absent
+	// attribute is the signal that an operator did not opt in. Set
+	// values pass through PrepareSpanContent (redact-then-truncate)
+	// so the same scrub passes both the OTel pipeline and (in the
+	// future) the audit payload-capture path.
+
+	// AttrGenAIInputMessages is the structured inbound message array
+	// the agent sent to the LLM — a JSON array of role+content pairs.
+	// Per OTel GenAI semantic conventions (current). Supersedes the
+	// deprecated `gen_ai.prompt` flat-string attribute.
+	AttrGenAIInputMessages = "gen_ai.input.messages"
+
+	// AttrGenAIOutputMessages is the structured response array from
+	// the model — a JSON array of role+content pairs (single element
+	// for a non-streaming, single-choice completion). Per OTel GenAI
+	// semantic conventions (current). Supersedes the deprecated
+	// `gen_ai.completion` flat-string attribute.
+	AttrGenAIOutputMessages = "gen_ai.output.messages"
+
+	// AttrForgeToolArgs is the raw arguments JSON the agent passed to
+	// a tool. Set on tool.<name> spans.
+	AttrForgeToolArgs = "forge.tool.args"
+
+	// AttrForgeToolResult is the raw output the tool returned. Set on
+	// tool.<name> spans.
+	AttrForgeToolResult = "forge.tool.result"
 )
diff --git a/forge-core/runtime/content_redact.go b/forge-core/runtime/content_redact.go
@@ -0,0 +1,147 @@
+package runtime
+
+import (
+	"encoding/json"
+	"regexp"
+
+	"github.com/initializ/forge/forge-core/llm"
+)
+
+// Span-attribute content capture (issue #130 / Phase 3.5).
+//
+// Phase 3 of the OTel Tracing v1 initiative (#108, PR #125) shipped
+// span instrumentation across the executor loop and tool calls but
+// kept it metadata-only — span attributes carried provider, model,
+// usage tokens, finish reasons, but no prompt / completion / tool I/O
+// text. Phase 2 (#103, PR #124) plumbed two operator-facing knobs
+// (`capture_content`, `redact`) through the config schema but the
+// runtime never read them. This file is the redact-and-cap pipeline
+// that Phase 3 sites call into when `CaptureContent=true` so the same
+// PII / secret scrub passes both the OTel attribute path and (in the
+// future) the audit payload-capture path.
+//
+// Pattern parity: RedactSecrets's regex list mirrors the runtime
+// guardrails CustomRule defaults in forge-cli/runtime/guardrails_loader.go's
+// DefaultStructuredGuardrails. The two should evolve together — when
+// a new secret token shape is added to the guardrails list, add it
+// here. The parity test in content_redact_parity_test.go inside
+// forge-cli/runtime/ enforces this at CI time.
+//
+// Order matters: redact runs BEFORE truncate so the truncation
+// boundary can never split a `[REDACTED]` marker mid-string.
+//
+// The functions are designed to be called on hot paths
+// (every LLM call, every tool call) so the regex set is pre-compiled
+// at package init and the empty-input fast path skips the pattern
+// loop entirely.
+
+// RedactionMarker is the placeholder substituted for any matched
+// secret. Operators grepping audit logs and traces for "[REDACTED]"
+// can correlate scrub events across both pipelines.
+const RedactionMarker = "[REDACTED]"
+
+// DefaultSpanContentCapBytes is the per-attribute byte cap for span
+// content. 4 KiB stays comfortably under common observability backend
+// limits (Datadog caps attributes around 5 KiB; Tempo's default attr
+// length limit is 4 KiB) so a long prompt doesn't get re-truncated by
+// the backend with a different marker shape, breaking the
+// correlate-by-marker grep flow.
+const DefaultSpanContentCapBytes = 4 << 10
+
+// redactPattern is a single regex applied to span / audit content
+// before storage. Each entry's regex is pre-compiled at init.
+type redactPattern struct {
+	name string
+	re   *regexp.Regexp
+}
+
+// redactPatterns covers token shapes operators have asked us to scrub
+// from prompts / completions / tool I/O. The shapes are drawn from
+// runtime-observed secrets in vendor SDKs — same list as the
+// guardrails CustomRules defaults. See the package-doc note above on
+// parity with forge-cli/runtime/guardrails_loader.go.
+var redactPatterns = []redactPattern{
+	{name: "anthropic_key", re: regexp.MustCompile(`sk-ant-[A-Za-z0-9\-]{20,}`)},
+	{name: "openai_key", re: regexp.MustCompile(`sk-[A-Za-z0-9]{20,}`)},
+	{name: "github_pat", re: regexp.MustCompile(`ghp_[A-Za-z0-9]{36}`)},
+	{name: "github_oauth", re: regexp.MustCompile(`gho_[A-Za-z0-9]{36}`)},
+	{name: "github_server", re: regexp.MustCompile(`ghs_[A-Za-z0-9]{36}`)},
+	{name: "github_fine", re: regexp.MustCompile(`github_pat_[A-Za-z0-9_]{22,}`)},
+	{name: "aws_access", re: regexp.MustCompile(`AKIA[0-9A-Z]{16}`)},
+	{name: "slack_bot", re: regexp.MustCompile(`xoxb-[0-9]{10,}-[A-Za-z0-9-]+`)},
+	{name: "slack_user", re: regexp.MustCompile(`xoxp-[0-9]{10,}-[A-Za-z0-9-]+`)},
+	// Private-key block: anchored to both BEGIN and END markers so we
+	// scrub the entire payload at once. (?s) makes . match newlines.
+	{name: "private_key", re: regexp.MustCompile(`(?s)-----BEGIN (RSA|EC|OPENSSH|PRIVATE) [^-]*KEY-----.*?-----END (RSA|EC|OPENSSH|PRIVATE) [^-]*KEY-----`)},
+	{name: "telegram_bot", re: regexp.MustCompile(`[0-9]{8,10}:[A-Za-z0-9_-]{35,}`)},
+}
+
+// RedactSecrets returns s with every known secret token shape replaced
+// by RedactionMarker. Empty input is returned unchanged (fast path).
+//
+// Applied in pattern-list order; overlap is fine because
+// ReplaceAllString rewrites the string left-to-right and subsequent
+// patterns operate on the post-replacement output. A run that matches
+// multiple shapes (e.g. an `sk-` prefix that also starts a longer
+// vendor key) is scrubbed once — RedactionMarker doesn't satisfy any
+// other pattern, so re-applying patterns is idempotent.
+func RedactSecrets(s string) string {
+	if s == "" {
+		return s
+	}
+	for _, p := range redactPatterns {
+		s = p.re.ReplaceAllString(s, RedactionMarker)
+	}
+	return s
+}
+
+// serializeChatMessages JSON-encodes the inbound chat messages list
+// for use as the gen_ai.prompt span attribute (OTel GenAI semantic
+// conventions). Returns the empty string for nil / empty input or on
+// marshal failure — an empty return signals the caller to skip
+// stamping the attribute, preserving the "absent attribute = no
+// opt-in" contract.
+//
+// Lives next to PrepareSpanContent because both are pure
+// content-shaping helpers for the span-capture pipeline; the audit
+// pipeline uses the same input but emits it as native event fields,
+// not a JSON blob.
+func serializeChatMessages(messages []llm.ChatMessage) string {
+	if len(messages) == 0 {
+		return ""
+	}
+	b, err := json.Marshal(messages)
+	if err != nil {
+		return ""
+	}
+	return string(b)
+}
+
+// PrepareSpanContent runs the redact (when redact=true) and
+// byte-cap-with-truncation-marker pipeline for content destined for
+// an OTel span attribute. The pipeline is:
+//
+//  1. Apply RedactSecrets when redact=true.
+//  2. TruncateForAudit (the same byte-cap helper the audit path uses)
+//     so a runaway prompt can't blow past the backend attribute limit
+//     and silently drop the marker.
+//
+// maxBytes <= 0 falls back to DefaultSpanContentCapBytes. The
+// truncation marker is identical to what AuditPayloadCapture writes,
+// so an operator who sees a `…[truncated:N]` suffix on an audit
+// payload-captured field sees the same suffix on the linked span
+// attribute for the same logical event.
+//
+// Returns the empty string when s is empty (skipping the pipeline).
+func PrepareSpanContent(s string, redact bool, maxBytes int) string {
+	if s == "" {
+		return s
+	}
+	if redact {
+		s = RedactSecrets(s)
+	}
+	if maxBytes <= 0 {
+		maxBytes = DefaultSpanContentCapBytes
+	}
+	return TruncateForAudit(s, maxBytes)
+}