Skip to content

feat(03-integration): alice.io WonderFence guardrails#223

Open
lior-k wants to merge 1 commit intostrands-agents:mainfrom
lior-k:alice-io-wonderfence
Open

feat(03-integration): alice.io WonderFence guardrails#223
lior-k wants to merge 1 commit intostrands-agents:mainfrom
lior-k:alice-io-wonderfence

Conversation

@lior-k
Copy link

@lior-k lior-k commented Feb 8, 2026

Summary

This PR adds a new third-party guardrails integration example demonstrating how to use Alice WonderFence with Strands agents for real-time AI safety protection.

What's New

  • Alice WonderFence Integration: Complete working example of integrating WonderFence guardrails with Strands agents
  • Hook-Based Implementation: Uses Strands lifecycle hooks to intercept and evaluate content at four key points:
    • on_before_model_call: Evaluates user prompts before reaching the model
    • on_after_model_call: Evaluates model responses before returning to users
    • on_before_tool_call: Evaluates tool input parameters for safety
    • on_after_tool_call: Evaluates tool execution results

Features

  • Adaptive Protection: Real-time detection and blocking of harmful prompts and outputs
  • Flexible Actions: BLOCK, MASK, or ALLOW content based on configured policies
  • Multimodal Support: Works with text, images, and other content types
  • Multilingual: Supports 20+ languages
  • Customizable Policies: Configure detection rules through the WonderFence UI

Example Test Cases

The integration includes demonstrations of:

  • Prompt injection attacks (system prompt override, impersonation)
  • Hate speech detection
  • PII masking (email addresses, phone numbers)
  • Abusive or harmful content filtering

Files Added

  • 03-integrations/third-party-guardrails/04-alice-wonderfence/README.md - Documentation and setup instructions
  • 03-integrations/third-party-guardrails/04-alice-wonderfence/guardrail.py - WonderFence hook implementation
  • 03-integrations/third-party-guardrails/04-alice-wonderfence/main.py - Demo application with test cases
  • 03-integrations/third-party-guardrails/04-alice-wonderfence/requirements.txt - Python dependencies

Adds third-party guardrails example integrating Alice WonderFence with Strands
agents for real-time AI safety protection. The integration uses Strands hooks
to evaluate prompts, responses, tool inputs, and tool outputs.

Features:
- Hook-based implementation (BeforeModelCall, AfterModelCall, BeforeTool, AfterTool)
- Support for BLOCK, MASK, and ALLOW actions
- Multimodal and multilingual detection (20+ languages)
- Example demonstrations of prompt injection, hate speech, and PII detection
- Customizable policies via WonderFence UI

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant