feat(policies): SensitivePath condition type — flag AI activity on protected files

## Idea

Add a new policy condition type that warns or blocks when AI activity touches designated sensitive files (e.g. `auth.rs`, `encryption.rs`, `signing.rs`). The goal is to give teams a governance guardrail: "AI should not be modifying cryptographic or authentication code without explicit review."

## Two possible approaches

### Option A — Tool-call based (`ForbiddenToolCall`)

Evaluate against session tool events: if `Write` or `Edit` was called with a `file_path` matching a sensitive glob pattern, fail the policy.

```json
{
  "type": "ForbiddenToolCall",
  "tool_names": ["Write", "Edit"],
  "when_files_match": ["**/encryption.rs", "**/auth.rs", "**/signing.rs"]
}
```

**Pros:** Fast to build, data is already captured in `events.jsonl` via `tool_input.file_path`.

**Cons — significant:**
- False positives: agent wrote to the file during the session but the developer reverted it before committing. The push is blocked even though no AI code landed.
- False negatives: agent modified the file via `Bash` (freeform shell command) — not captured as a structured `file_path`.
- Evaluates *intent to modify* not *what was committed*. Framing matters: this can be presented as "agent attempted to touch sensitive files" but it is not a reliable committed-code gate.

### Option B — Attribution-based (`SensitivePath` on committed diff)

Evaluate against the commit diff + attribution data: if committed lines in the push are AI-attributed *and* touch a file matching the pattern, fail the policy.

```json
{
  "type": "SensitivePath",
  "patterns": ["**/encryption.rs", "**/auth.rs"],
  "action": "warn"
}
```

**Pros:** Correct answer — only fires when AI-written code actually landed in the commit.

**Cons:**
- Requires the attribution pipeline to have run (server-side clone + tree-sitter line attribution). Not always available.
- More complex evaluation path — needs to join commit diff with attribution data rather than just inspecting session tool calls.
- Attribution confidence scores add ambiguity: what threshold counts as "AI-written"?

## Recommendation

Option B is the right long-term answer. Option A could be shipped as a stepping stone with clear UI copy that sets expectations ("flags sessions where the agent attempted to modify these files").

Before building either, worth deciding:
- Should this block push or warn only? (Blocking with false positives from Option A would be very disruptive.)
- What's the attribution confidence threshold for Option B?
- Should `Bash` calls be scanned for path patterns in their input text? (Partial mitigation for Option A false negatives, but brittle.)
- Human-written changes to sensitive files should never trigger this — how do we ensure that? (Option B handles it naturally; Option A does not.)

## Related

- Existing `ConditionalToolCall` condition — requires a tool was called on matching files (opposite direction)
- Attribution engine in `tracevault-core/src/diff.rs` and `policy_eval.rs`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(policies): SensitivePath condition type — flag AI activity on protected files #166

Idea

Two possible approaches

Option A — Tool-call based (`ForbiddenToolCall`)

Option B — Attribution-based (`SensitivePath` on committed diff)

Recommendation

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(policies): SensitivePath condition type — flag AI activity on protected files #166

Description

Idea

Two possible approaches

Option A — Tool-call based (ForbiddenToolCall)

Option B — Attribution-based (SensitivePath on committed diff)

Recommendation

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Option A — Tool-call based (`ForbiddenToolCall`)

Option B — Attribution-based (`SensitivePath` on committed diff)