Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 18 additions & 6 deletions .claude/skills/forge.md
Original file line number Diff line number Diff line change
Expand Up @@ -660,10 +660,22 @@ Forge-specific attributes use the `forge.*` namespace
(`forge.task.id`, `forge.task.final_state`, `forge.tool.name`,
`forge.workflow.id`, ...).

Phase 3 ships **metadata-only** spans. `capture_content` is plumbed
through the config schema but not yet honored by the instrumentation;
content capture is a follow-up that will reuse the FWS-8 audit
redactor.
**Default posture is metadata-only.** Prompts, completions, tool
args, and tool results are NOT stamped on spans unless
`observability.tracing.capture_content: true` is set (Phase 3.5 /
#130). When opted-in: `llm.completion` gains `gen_ai.input.messages`
(JSON array of role+content sent to the model) +
`gen_ai.output.messages` (JSON single-element array for the response,
current OTel GenAI semconv; supersedes the deprecated flat-string
`gen_ai.prompt` / `gen_ai.completion`);
`tool.<name>` gains `forge.tool.args` + `forge.tool.result`.
Captured values pass through a redactor (vendor secret-token shapes:
Anthropic / OpenAI / GitHub / AWS / Slack / private keys / Telegram)
when `redact: true` (default with capture). Each value is byte-capped
at 4 KiB with a `…[truncated:N]` marker byte-identical to the audit
payload-capture marker, so an operator grepping `[truncated:` across
spans and audit rows sees aligned output. `redact: false` is the
enterprise raw-capture path.

**Read**: `docs/core-concepts/observability-tracing.md`,
`docs/reference/forge-yaml-schema.md` § `observability.tracing`,
Expand Down Expand Up @@ -790,8 +802,8 @@ observability: # OTel Tracing v1 (#108) — off by default
service_name: "" # default: agent_id
headers: { x-tenant: demo }
resource_attrs: { deployment.environment: prod }
redact: true
capture_content: false # Phase 3 ships metadata-only
redact: true # scrub vendor secrets when capture is on
capture_content: false # off by default; opt in to span content

skills:
path: SKILL.md # main agent skill file
Expand Down
27 changes: 21 additions & 6 deletions docs/core-concepts/observability-tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ observability:
x-tenant: demo
resource_attrs: # extra OTel resource attributes
deployment.environment: prod
redact: true # default true — Phase 3 metadata-only ships now
capture_content: false # enterprise opt-in for prompt/completion content
redact: true # scrub vendor secret tokens when capture_content is on
capture_content: false # opt-in: stamp prompt/completion/tool I/O on spans
```

| Field | Type | Default | Notes |
Expand All @@ -60,8 +60,8 @@ observability:
| `service_name` | string | `agent_id` | `OTEL_SERVICE_NAME` env wins if set. |
| `headers` | map | — | OTLP HTTP/gRPC headers. Env is the preferred path for secrets. |
| `resource_attrs` | map | — | Merged with the auto-stamped `service.*` + `forge.runtime.version`. |
| `redact` | bool | `true` | PII redaction posture flag (consumed by Phase 3+ instrumentation). |
| `capture_content` | bool | `false` | Reserved — metadata-only spans ship now; content capture is a follow-up. |
| `redact` | bool | `true` | When `capture_content: true`, scrub vendor secret tokens (Anthropic / OpenAI / GitHub / AWS / Slack / private keys / Telegram) before stamping content attributes. See [Span content capture](#span-content-capture). |
| `capture_content` | bool | `false` | Stamp prompt / completion / tool I/O as span attributes. Off by default; metadata-only spans ship. See [Span content capture](#span-content-capture). |

## Config precedence

Expand Down Expand Up @@ -152,9 +152,24 @@ Forge mixes OTel GenAI semconv with Forge-specific `forge.*` namespaced attribut

Tool errors do **not** fail the outer `agent.execute` span — they surface to the LLM as text and the loop continues. The tool span carries the failure detail so operators can pivot from a trace to the specific failed invocation.

### Phase 3 is metadata-only
### Span content capture

Tool args / results, prompts, completions are **not** recorded as span attributes today. The `capture_content` + `redact` knobs are plumbed but not yet honored by the instrumentation — content capture is a follow-up that will reuse the FWS-8 audit redactor.
Prompts, completions, tool args, and tool results are **off by default** — Phase 3 spans ship metadata only (provider, model, usage, finish reasons, tool name). Operators who need content attributes for in-trace debugging or supervised-learning corpora opt in via `observability.tracing.capture_content: true` (Phase 3.5 / issue #130).

| `forge.yaml` knob | Span | Attribute keys added when `capture_content: true` |
|---|---|---|
| (always) | `llm.completion` | `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons` |
| `capture_content: true` | `llm.completion` | `gen_ai.input.messages` (JSON array of role+content pairs sent to the model), `gen_ai.output.messages` (JSON single-element array of role+content for the model's response) — current OTel GenAI semconv, supersedes the deprecated flat-string `gen_ai.prompt` / `gen_ai.completion` |
| (always) | `tool.<name>` | `forge.tool.name`, `forge.tool.error` (on failure) |
| `capture_content: true` | `tool.<name>` | `forge.tool.args` (raw arguments JSON), `forge.tool.result` (raw output) |

When `capture_content: true` and `redact: true` (the default when capture is on), attribute values pass through a redactor that scrubs the same vendor secret-token shapes the runtime guardrails default rules cover (Anthropic `sk-ant-…`, OpenAI `sk-…`, GitHub `ghp_/gho_/ghs_/github_pat_…`, AWS `AKIA…`, Slack `xoxb-/xoxp-…`, RSA/EC/OPENSSH/PRIVATE key blocks, Telegram bot tokens). Matched values become `[REDACTED]`. Setting `redact: false` is the enterprise raw-capture path — content is stamped verbatim with the byte cap still applied.

Every captured value is byte-capped at **4 KiB** (below the 5 KiB attribute soft-cap most backends apply). When the input exceeds the cap, the value ends with a `…[truncated:N]` marker where `N` is the original byte length. The marker is **byte-identical** to what the audit payload-capture path emits for the same input, so an operator grepping `[truncated:` across span attributes and audit rows sees aligned output.

**Default posture** (no opt-in): the `gen_ai.input.messages`, `gen_ai.output.messages`, `forge.tool.args`, `forge.tool.result` keys are **absent** from spans — not set to empty string. Backends that gate dashboards on "is this key present?" can distinguish "metadata-only by default" from "operator opted in but the field happened to be empty."

**OTel semconv versioning note**: the GenAI semantic conventions moved from flat-string (`gen_ai.prompt`, `gen_ai.completion`) to structured (`gen_ai.input.messages`, `gen_ai.output.messages`) attributes. Forge emits only the **current** structured keys. Backends that only recognize the deprecated flat-string attributes will not show prompt / completion text on Forge spans — upgrade the backend's semconv mapping or use a span processor to translate.

## End-to-end propagation (Phase 5)

Expand Down
31 changes: 31 additions & 0 deletions docs/security/audit-logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,37 @@ shape — no `trace_id` / `span_id` keys appear. The
`AuditSchemaVersion` is NOT bumped: adding optional fields is a
schema-compatible change per the policy above.

### Content-capture parity

When `observability.tracing.capture_content: true` is set, prompt /
completion / tool-args / tool-result content appears on **both** the
linked OTel span and the FWS-8 audit row for the same logical event.
The two pipelines run the captured content through the same redact-
then-truncate helper (`runtime.PrepareSpanContent` /
`runtime.TruncateForAudit`) so:

- The redaction marker is identical (`[REDACTED]`) — operators
grepping either sink for vendor secret-token shapes see the same
match.
- The truncation marker is byte-identical (`…[truncated:N]` where
`N` is the original byte length of the input). Grepping
`[truncated:` across audit rows and span attributes returns
aligned, comparable results.
- The redact patterns mirror the runtime guardrails CustomRules
defaults (Anthropic / OpenAI / GitHub / AWS / Slack / private key
blocks / Telegram bot tokens). Adding a new vendor pattern to one
pipeline implies adding it to the other.

The audit pipeline's byte cap (16 KiB per field, see
`AuditPayloadCapture.Cap*Bytes`) is intentionally larger than the
span cap (4 KiB — below the soft attribute-length limit most
observability backends apply). The two caps are independent: a single
event may be truncated on the span side and survive intact on the
audit side. The trailing marker shape is the same either way.

See [Observability — Span content capture](../core-concepts/observability-tracing.md#span-content-capture) for the
span-side attribute keys and opt-in switches.

## Streams (FWS-9)

`forge run` / `forge serve` use the OS streams as a stream-level
Expand Down
9 changes: 9 additions & 0 deletions forge-cli/runtime/runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -834,6 +834,15 @@ func (r *Runner) Run(ctx context.Context) error {
MaxIterations: 100,
CharBudget: charBudget,
FilesDir: filepath.Join(r.cfg.WorkDir, ".forge", "files"),
// Issue #130 — the same resolved TracingConfig
// already passed to NewTracerProvider drives Phase
// 3.5 span-content capture inside the executor
// loop. Disabled state (Enabled=false +
// CaptureContent=false) is the zero-value default,
// so missing this on an older config schema is
// equivalent to "metadata-only spans" — the
// posture this initiative preserves.
TracingConfig: tracingCfg,
}
if r.derivedCLIConfig != nil {
execCfg.WorkflowPhases = r.derivedCLIConfig.WorkflowPhases
Expand Down
34 changes: 31 additions & 3 deletions forge-core/observability/attrs.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,14 +88,42 @@ const (
AttrForgeLoopIteration = "forge.loop.iteration"

// AttrForgeToolName / AttrForgeToolError name the tool call
// instrumentation. Tool args / results are NOT recorded here —
// Phase 3 is metadata-only. A future "capture_content=true with
// PII redaction" phase will add args/result attribute keys.
// instrumentation.
AttrForgeToolName = "forge.tool.name"
AttrForgeToolError = "forge.tool.error"

// AttrForgeTaskFinalState is the terminal A2A TaskState the loop
// resolved to — "completed", "failed", "canceled". Set on the
// agent.execute span just before End.
AttrForgeTaskFinalState = "forge.task.final_state"

// ─── Content-capture attributes (Phase 3.5 / issue #130) ─────
//
// These attributes are set only when TracingConfig.CaptureContent
// is true. The default posture remains metadata-only: an absent
// attribute is the signal that an operator did not opt in. Set
// values pass through PrepareSpanContent (redact-then-truncate)
// so the same scrub passes both the OTel pipeline and (in the
// future) the audit payload-capture path.

// AttrGenAIInputMessages is the structured inbound message array
// the agent sent to the LLM — a JSON array of role+content pairs.
// Per OTel GenAI semantic conventions (current). Supersedes the
// deprecated `gen_ai.prompt` flat-string attribute.
AttrGenAIInputMessages = "gen_ai.input.messages"

// AttrGenAIOutputMessages is the structured response array from
// the model — a JSON array of role+content pairs (single element
// for a non-streaming, single-choice completion). Per OTel GenAI
// semantic conventions (current). Supersedes the deprecated
// `gen_ai.completion` flat-string attribute.
AttrGenAIOutputMessages = "gen_ai.output.messages"

// AttrForgeToolArgs is the raw arguments JSON the agent passed to
// a tool. Set on tool.<name> spans.
AttrForgeToolArgs = "forge.tool.args"

// AttrForgeToolResult is the raw output the tool returned. Set on
// tool.<name> spans.
AttrForgeToolResult = "forge.tool.result"
)
147 changes: 147 additions & 0 deletions forge-core/runtime/content_redact.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
package runtime

import (
"encoding/json"
"regexp"

"github.com/initializ/forge/forge-core/llm"
)

// Span-attribute content capture (issue #130 / Phase 3.5).
//
// Phase 3 of the OTel Tracing v1 initiative (#108, PR #125) shipped
// span instrumentation across the executor loop and tool calls but
// kept it metadata-only — span attributes carried provider, model,
// usage tokens, finish reasons, but no prompt / completion / tool I/O
// text. Phase 2 (#103, PR #124) plumbed two operator-facing knobs
// (`capture_content`, `redact`) through the config schema but the
// runtime never read them. This file is the redact-and-cap pipeline
// that Phase 3 sites call into when `CaptureContent=true` so the same
// PII / secret scrub passes both the OTel attribute path and (in the
// future) the audit payload-capture path.
//
// Pattern parity: RedactSecrets's regex list mirrors the runtime
// guardrails CustomRule defaults in forge-cli/runtime/guardrails_loader.go's
// DefaultStructuredGuardrails. The two should evolve together — when
// a new secret token shape is added to the guardrails list, add it
// here. The parity test in content_redact_parity_test.go inside
// forge-cli/runtime/ enforces this at CI time.
//
// Order matters: redact runs BEFORE truncate so the truncation
// boundary can never split a `[REDACTED]` marker mid-string.
//
// The functions are designed to be called on hot paths
// (every LLM call, every tool call) so the regex set is pre-compiled
// at package init and the empty-input fast path skips the pattern
// loop entirely.

// RedactionMarker is the placeholder substituted for any matched
// secret. Operators grepping audit logs and traces for "[REDACTED]"
// can correlate scrub events across both pipelines.
const RedactionMarker = "[REDACTED]"

// DefaultSpanContentCapBytes is the per-attribute byte cap for span
// content. 4 KiB stays comfortably under common observability backend
// limits (Datadog caps attributes around 5 KiB; Tempo's default attr
// length limit is 4 KiB) so a long prompt doesn't get re-truncated by
// the backend with a different marker shape, breaking the
// correlate-by-marker grep flow.
const DefaultSpanContentCapBytes = 4 << 10

// redactPattern is a single regex applied to span / audit content
// before storage. Each entry's regex is pre-compiled at init.
type redactPattern struct {
name string
re *regexp.Regexp
}

// redactPatterns covers token shapes operators have asked us to scrub
// from prompts / completions / tool I/O. The shapes are drawn from
// runtime-observed secrets in vendor SDKs — same list as the
// guardrails CustomRules defaults. See the package-doc note above on
// parity with forge-cli/runtime/guardrails_loader.go.
var redactPatterns = []redactPattern{
{name: "anthropic_key", re: regexp.MustCompile(`sk-ant-[A-Za-z0-9\-]{20,}`)},
{name: "openai_key", re: regexp.MustCompile(`sk-[A-Za-z0-9]{20,}`)},
{name: "github_pat", re: regexp.MustCompile(`ghp_[A-Za-z0-9]{36}`)},
{name: "github_oauth", re: regexp.MustCompile(`gho_[A-Za-z0-9]{36}`)},
{name: "github_server", re: regexp.MustCompile(`ghs_[A-Za-z0-9]{36}`)},
{name: "github_fine", re: regexp.MustCompile(`github_pat_[A-Za-z0-9_]{22,}`)},
{name: "aws_access", re: regexp.MustCompile(`AKIA[0-9A-Z]{16}`)},
{name: "slack_bot", re: regexp.MustCompile(`xoxb-[0-9]{10,}-[A-Za-z0-9-]+`)},
{name: "slack_user", re: regexp.MustCompile(`xoxp-[0-9]{10,}-[A-Za-z0-9-]+`)},
// Private-key block: anchored to both BEGIN and END markers so we
// scrub the entire payload at once. (?s) makes . match newlines.
{name: "private_key", re: regexp.MustCompile(`(?s)-----BEGIN (RSA|EC|OPENSSH|PRIVATE) [^-]*KEY-----.*?-----END (RSA|EC|OPENSSH|PRIVATE) [^-]*KEY-----`)},
{name: "telegram_bot", re: regexp.MustCompile(`[0-9]{8,10}:[A-Za-z0-9_-]{35,}`)},
}

// RedactSecrets returns s with every known secret token shape replaced
// by RedactionMarker. Empty input is returned unchanged (fast path).
//
// Applied in pattern-list order; overlap is fine because
// ReplaceAllString rewrites the string left-to-right and subsequent
// patterns operate on the post-replacement output. A run that matches
// multiple shapes (e.g. an `sk-` prefix that also starts a longer
// vendor key) is scrubbed once — RedactionMarker doesn't satisfy any
// other pattern, so re-applying patterns is idempotent.
func RedactSecrets(s string) string {
if s == "" {
return s
}
for _, p := range redactPatterns {
s = p.re.ReplaceAllString(s, RedactionMarker)
}
return s
}

// serializeChatMessages JSON-encodes the inbound chat messages list
// for use as the gen_ai.prompt span attribute (OTel GenAI semantic
// conventions). Returns the empty string for nil / empty input or on
// marshal failure — an empty return signals the caller to skip
// stamping the attribute, preserving the "absent attribute = no
// opt-in" contract.
//
// Lives next to PrepareSpanContent because both are pure
// content-shaping helpers for the span-capture pipeline; the audit
// pipeline uses the same input but emits it as native event fields,
// not a JSON blob.
func serializeChatMessages(messages []llm.ChatMessage) string {
if len(messages) == 0 {
return ""
}
b, err := json.Marshal(messages)
if err != nil {
return ""
}
return string(b)
}

// PrepareSpanContent runs the redact (when redact=true) and
// byte-cap-with-truncation-marker pipeline for content destined for
// an OTel span attribute. The pipeline is:
//
// 1. Apply RedactSecrets when redact=true.
// 2. TruncateForAudit (the same byte-cap helper the audit path uses)
// so a runaway prompt can't blow past the backend attribute limit
// and silently drop the marker.
//
// maxBytes <= 0 falls back to DefaultSpanContentCapBytes. The
// truncation marker is identical to what AuditPayloadCapture writes,
// so an operator who sees a `…[truncated:N]` suffix on an audit
// payload-captured field sees the same suffix on the linked span
// attribute for the same logical event.
//
// Returns the empty string when s is empty (skipping the pipeline).
func PrepareSpanContent(s string, redact bool, maxBytes int) string {
if s == "" {
return s
}
if redact {
s = RedactSecrets(s)
}
if maxBytes <= 0 {
maxBytes = DefaultSpanContentCapBytes
}
return TruncateForAudit(s, maxBytes)
}
Loading
Loading