Skip to content

feat(policies): SensitivePath condition type — flag AI activity on protected files #166

@hashedone

Description

@hashedone

Idea

Add a new policy condition type that warns or blocks when AI activity touches designated sensitive files (e.g. auth.rs, encryption.rs, signing.rs). The goal is to give teams a governance guardrail: "AI should not be modifying cryptographic or authentication code without explicit review."

Two possible approaches

Option A — Tool-call based (ForbiddenToolCall)

Evaluate against session tool events: if Write or Edit was called with a file_path matching a sensitive glob pattern, fail the policy.

{
  "type": "ForbiddenToolCall",
  "tool_names": ["Write", "Edit"],
  "when_files_match": ["**/encryption.rs", "**/auth.rs", "**/signing.rs"]
}

Pros: Fast to build, data is already captured in events.jsonl via tool_input.file_path.

Cons — significant:

  • False positives: agent wrote to the file during the session but the developer reverted it before committing. The push is blocked even though no AI code landed.
  • False negatives: agent modified the file via Bash (freeform shell command) — not captured as a structured file_path.
  • Evaluates intent to modify not what was committed. Framing matters: this can be presented as "agent attempted to touch sensitive files" but it is not a reliable committed-code gate.

Option B — Attribution-based (SensitivePath on committed diff)

Evaluate against the commit diff + attribution data: if committed lines in the push are AI-attributed and touch a file matching the pattern, fail the policy.

{
  "type": "SensitivePath",
  "patterns": ["**/encryption.rs", "**/auth.rs"],
  "action": "warn"
}

Pros: Correct answer — only fires when AI-written code actually landed in the commit.

Cons:

  • Requires the attribution pipeline to have run (server-side clone + tree-sitter line attribution). Not always available.
  • More complex evaluation path — needs to join commit diff with attribution data rather than just inspecting session tool calls.
  • Attribution confidence scores add ambiguity: what threshold counts as "AI-written"?

Recommendation

Option B is the right long-term answer. Option A could be shipped as a stepping stone with clear UI copy that sets expectations ("flags sessions where the agent attempted to modify these files").

Before building either, worth deciding:

  • Should this block push or warn only? (Blocking with false positives from Option A would be very disruptive.)
  • What's the attribution confidence threshold for Option B?
  • Should Bash calls be scanned for path patterns in their input text? (Partial mitigation for Option A false negatives, but brittle.)
  • Human-written changes to sensitive files should never trigger this — how do we ensure that? (Option B handles it naturally; Option A does not.)

Related

  • Existing ConditionalToolCall condition — requires a tool was called on matching files (opposite direction)
  • Attribution engine in tracevault-core/src/diff.rs and policy_eval.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions