Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 209 additions & 2 deletions docs/security/guardrails.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: "Configurable content filtering, PII detection, and jailbreak prote
order: 7
---

The guardrail engine validates inbound and outbound messages against configurable policy rules using the `github.com/initializ/guardrails` library.
The guardrail engine validates inbound and outbound messages against configurable policy rules using the [`github.com/initializ/guardrails`](https://github.com/initializ/guardrails) library (pinned at `v0.12.0` in `forge-cli/go.mod`). Both the file-mode `guardrails.json` schema and the MongoDB-mode `AgentConfig` documents share the same shape — `models.StructuredGuardrails` from that library's [`models/config.go`](https://github.com/initializ/guardrails/blob/v0.12.0/models/config.go). The library's `models` package is the authoritative type definition; this page mirrors it for the fields most operators reach for, but field-level additions in newer library versions will surface there first.

## Architecture

Expand All @@ -15,7 +15,25 @@ Guardrails are implemented as a `GuardrailChecker` interface in forge-core, with
| **File mode** (default) | `guardrails.json` in project root | Local development, standalone deployments |
| **DB mode** | MongoDB (`AgentConfig` collection) | Platform deployments with centralized config + audit |

Priority: `FORGE_GUARDRAILS_DB` env → `guardrails.json` → built-in defaults.
### Source precedence

`BuildGuardrailChecker` (`forge-cli/runtime/guardrails_loader.go`) resolves the active config in this exact order at every `forge run`. The first row whose trigger matches is the one that loads — there is no merging across rows.

| # | Trigger | Outcome | What happens to lower-priority sources |
|---|---|---|---|
| 1 | `FORGE_GUARDRAILS_DB` set **and** MongoDB connect + ping succeed | **DB mode** — config loaded per-request from the `AgentConfig` collection in the `Initializ` database, keyed by `(FORGE_AGENT_ID, FORGE_ORG_ID)`. Audit records written back to MongoDB (`EnableAudit: true`). | `guardrails.json` is ignored at runtime even if present in the image. |
| 2 | `FORGE_GUARDRAILS_DB` set **but** connect or ping fails (10-second timeout) | Warns `failed to connect guardrails DB, falling back to file` and continues to row 3. | Falls through. |
| 3 | `guardrails.json` exists at `<workDir>/<cfg.GuardrailsPath \|\| "guardrails.json">` and parses | **File mode** — config is the parsed `StructuredGuardrails`. | Built-in defaults discarded. |
| 4 | File missing, unreadable, or invalid JSON | **Built-in defaults** — bundled PII (email/phone/SSN/credit-card) + jailbreak/prompt-injection/command-injection detection + 11 secret-pattern regexes (see [Default Secret Patterns](#default-secret-patterns)). | N/A — this is the floor. |
| 5 | Engine construction itself errors (after picking 3 or 4) | Warns `failed to create file guardrail engine, using noop` and installs a `NoopGuardrailChecker`. | No checks run; messages pass through unmodified. |

Notable consequences of the order:

- **MongoDB mode bypasses `guardrails.json` entirely** — the file still gets baked into `/app/guardrails.json` by the build, but the runner never opens it when `FORGE_GUARDRAILS_DB` is set. You can flip an agent between modes at runtime by setting / unsetting the env var without rebuilding.
- **DB failure is non-fatal.** A misconfigured URI or transient network issue drops to file mode with a warning, not a hard exit. If you need a hard requirement, monitor the warning in your log pipeline (`failed to connect guardrails DB, falling back to file`).
- **DB mode requires `FORGE_ORG_ID`** to scope the `AgentConfig` lookup. Forgetting to set it usually surfaces as the library returning no config and decisions defaulting through; check that org ID is populated alongside the URI.
- **`cfg.GuardrailsPath`** in `forge.yaml` overrides the default `"guardrails.json"` filename for file mode only — it has no effect in DB mode.
- **Audit sinks differ.** DB mode writes via the library's `EnableAudit` path into MongoDB. File mode emits Forge's normal `guardrail_check` audit events through the configured audit sinks (see [Audit Events](#audit-events)).

## Built-in Evaluators

Expand Down Expand Up @@ -141,6 +159,195 @@ Gates control which evaluation points are active:
| `contextGate` | `false` | Validates context window content |
| `streamGate` | `false` | Validates streaming chunks |

### Full `guardrails.json` Schema

The `StructuredGuardrails` document (`github.com/initializ/guardrails/models.StructuredGuardrails`) has the following top-level blocks. Every block is optional — omitted blocks disable the corresponding evaluator. Field names use camelCase to match the JSON / BSON tags on the library structs.

| Top-level key | Library type | Purpose |
|---|---|---|
| `pii` | `*PIIConfig` | PII detection (email, phone, SSN, credit-card, …) |
| `moderation` | `*ModerationConfig` | Content-moderation categories (hate, harassment, violence, sexual, …) |
| `security` | `*SecurityConfig` | Jailbreak, prompt injection, SQL injection, command injection, custom security patterns |
| `urlFilter` | `*URLFilterConfig` | Allowlist / denylist URLs in inbound and outbound text |
| `customRules` | `*CustomRulesConfig` | User-defined regex / keyword / phrase rules with gate scoping |
| `approvalGates` | `[]ApprovalCondition` | Per-condition human-approval gates with notification channels |
| `nsfwText` | `*NSFWTextConfig` | NSFW-text confidence-threshold detection |
| `hallucination` | `*HallucinationConfig` | Hallucination detection — `require_sources` or `review` mode |
| `skillConstraints` | `*SkillConstraintsConfig` | Allowed / blocked skill names with per-decision action |
| `gateConfig` | `*GateConfig` | Which gates (input / tool-call / output / context / stream) fire |

#### `pii`

```json
{
"enabled": true,
"action": "mask",
"categories": {
"email": { "enabled": true, "action": "mask" },
"phoneNumber": { "enabled": true, "action": "mask" }
}
}
```

| Field | Type | Notes |
|---|---|---|
| `enabled` | `bool` | Master switch for PII detection |
| `action` | `string` | Global default: `mask` / `block` / `warn` |
| `categories` | `map<string, PIICategoryConfig>` | Per-category overrides; each entry has `enabled`, `action`, optional `id`, `label` |

#### `moderation`

```json
{
"enabled": true,
"action": "warn",
"categories": {
"hate": { "enabled": true, "action": "block", "threshold": 0.8 }
}
}
```

Same shape as `pii`, but each `ModerationCategoryConfig` also accepts a `threshold` float (0.0–1.0) and optional `description`.

#### `security`

```json
{
"jailbreakDetection": { "enabled": true, "confidenceThreshold": 25, "action": "block" },
"promptInjection": { "enabled": true, "confidenceThreshold": 30, "action": "block" },
"sqlInjection": { "enabled": true, "confidenceThreshold": 40, "action": "block" },
"commandInjection": { "enabled": true, "confidenceThreshold": 35, "action": "block" },
"customPatterns": [
{ "name": "internal-token", "pattern": "INT-[A-Z0-9]{16}", "action": "block", "description": "Internal service tokens" }
]
}
```

| Sub-block | Type | Notes |
|---|---|---|
| `jailbreakDetection`, `promptInjection`, `sqlInjection`, `commandInjection` | `*ThresholdConfig` | Each has `enabled` (bool), `confidenceThreshold` (0–100 percent), `action` (string) |
| `customPatterns` | `[]SecurityPattern` | Each has `name`, `pattern` (regex), `action`, optional `description` |

#### `urlFilter`

```json
{
"enabled": true,
"mode": "denylist",
"denylist": ["evil.example.com"],
"allowlist": [],
"maskAction": "redact",
"replaceWith": "[URL REDACTED]",
"action": "mask"
}
```

| Field | Type | Notes |
|---|---|---|
| `mode` | `string` | `allowlist` / `denylist` / `both` |
| `allowlist`, `denylist` | `[]string` | Host patterns |
| `action` | `string` | `mask` / `block` / `warn` |
| `maskAction`, `replaceWith` | `string` | Used when `action: mask` |

#### `customRules`

```json
{
"hardConstraints": ["no-pii-leakage"],
"softConstraints": ["polite-tone"],
"rules": [
{
"id": "secret_openai",
"name": "OpenAI API Key",
"type": "regex",
"constraint": "hard",
"pattern": "sk-[A-Za-z0-9]{20,}",
"action": "mask",
"gates": ["output", "tool_call"]
}
]
}
```

| Rule field | Type | Notes |
|---|---|---|
| `id`, `name` | `string` | Identification |
| `type` | `string` | `regex` / `keyword` / `phrase` |
| `constraint` | `string` | `hard` (fail-fast) / `soft` (logged only) |
| `pattern` | `string` | Required for `regex` type |
| `keywords` | `[]string` | Required for `keyword` / `phrase` types |
| `action` | `string` | `mask` / `block` / `warn` |
| `gates` | `[]string` | Which gates to apply to — `input`, `output`, `tool_call`, `context`, `stream` |
| `caseSensitive` | `bool` | Default false for keyword/phrase types |
| `description` | `string` | Optional human-readable note |

#### `approvalGates`

```json
[
{
"id": "high-risk-action",
"condition": "tool == 'kubectl' && contains(args, 'delete')",
"action": "require_human_approval",
"notifyChannels": ["slack:ops"]
}
]
```

| Field | Type | Notes |
|---|---|---|
| `id`, `condition` | `string` | Required |
| `action` | `string` | `block` / `require_human_approval` / `warn` |
| `notifyChannels` | `[]string` | Channel adapters to notify on trigger |

#### `nsfwText`

```json
{ "enabled": true, "confidenceThreshold": 0.85, "action": "block" }
```

`confidenceThreshold` is a 0.0–1.0 float (different from the 0–100 percentage used by `security.*`).

#### `hallucination`

```json
{
"enabled": true,
"mode": "require_sources",
"minSourceCount": 1,
"action": "warn"
}
```

| Field | Type | Notes |
|---|---|---|
| `mode` | `string` | `require_sources` / `review` |
| `minSourceCount` | `int` | Minimum source citations required when `mode: require_sources` |

#### `skillConstraints`

```json
{
"enabled": true,
"allowedSkills": ["code_review_diff", "review_github_list_prs"],
"blockedSkills": [],
"action": "block"
}
```

Allowed / blocked skill names checked against the agent's registered skill set.

#### `gateConfig`

See [Gate Configuration](#gate-configuration) above. Field types are all `bool` in the library struct.

### Compatibility notes

- The `forge init` template generates a `guardrails.json` containing only the `pii`, `security`, `customRules`, and `gateConfig` blocks. The other blocks (`moderation`, `urlFilter`, `approvalGates`, `nsfwText`, `hallucination`, `skillConstraints`) are not bootstrapped but are accepted at runtime — add them by hand if you need them.
- All blocks are pointer-typed in the library struct. Omitting a key in JSON is equivalent to disabling that evaluator; setting an empty object `{}` with `enabled: false` is functionally the same but uses one extra parse cycle.
- camelCase JSON keys are the contract — the BSON tags happen to be identical so a `StructuredGuardrails` document round-trips between MongoDB and `guardrails.json` without translation.
- For evaluator semantics, regex flag handling, and the full action vocabulary, see the library's [`models/config.go`](https://github.com/initializ/guardrails/blob/v0.12.0/models/config.go).

## DB Mode (Platform Deployments)

When `FORGE_GUARDRAILS_DB` is set to a MongoDB connection URI, the engine loads guardrails config from the `AgentConfig` collection and enables audit logging.
Expand Down
Loading