Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 230 additions & 0 deletions AI_AGENT_SECURITY_FEATURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
# AI Agent Security Testing (Moltbot/ClawdBot Vulnerability Patterns)

This feature adds comprehensive testing capabilities for AI agent security vulnerabilities, based on real-world attacks discovered in January 2026 targeting Moltbot (formerly ClawdBot) and similar AI agent platforms.

## What Was Added

### 1. AgentCommandInjectionConverter
**Location**: `pyrit/prompt_converter/agent_command_injection_converter.py`

A new prompt converter that generates command injection patterns to test AI agents for vulnerabilities:

**Injection Types:**
- `hidden_instruction` - Hidden commands embedded in normal text
- `cron` - Scheduled task injection (Moltbot-style attack)
- `file_read` - Unauthorized file system access attempts
- `credential_theft` - Credential exfiltration patterns
- `system_info` - System information gathering/reconnaissance

**Key Features:**
- Stealth mode for subtle, hard-to-detect injections
- Exfiltration target support for data theft testing
- Command prefix customization for different agent syntaxes
- Based on actual vulnerability patterns from real attacks

### 2. AI Agent Security Dataset
**Location**: `pyrit/datasets/seed_datasets/local/airt/ai_agent_security.prompt`

A comprehensive dataset of 60+ test objectives covering:
- Command injection attacks
- Credential theft attempts
- Unauthorized file access
- System reconnaissance
- Hidden instruction injection
- Data exfiltration patterns
- Indirect prompt injection
- Multi-stage attacks
- Supply chain compromises

### 3. Documentation
**Location**: `doc/code/converters/ai_agent_security_testing.md`

Complete guide including:
- Background on Moltbot vulnerabilities
- Usage examples for all injection types
- Integration with PyRIT's attack strategies
- Scoring and detection patterns
- Best practices and mitigation recommendations
- Real-world attack scenario recreations

### 4. Unit Tests
**Location**: `tests/unit/converter/test_agent_command_injection_converter.py`

Comprehensive test suite with 20+ test cases covering:
- Initialization and configuration
- All injection type generations
- Stealth vs non-stealth modes
- Exfiltration target handling
- Input validation
- Output correctness

### 5. Demo Script
**Location**: `examples/ai_agent_security_demo.py`

Interactive demonstration showing:
- All injection pattern types
- Stealth mode comparison
- Moltbot-style cron injection
- Dataset integration
- Visual examples of generated attacks

## Background: The Moltbot Vulnerabilities (Jan 2026)

In January 2026, security researchers discovered critical vulnerabilities in Moltbot (formerly ClawdBot), a rapidly popular open-source AI agent platform that gained 98K GitHub stars in days:

### Key Vulnerabilities Found:

1. **Cleartext Credential Storage**
- API keys, secrets stored unencrypted in `~/.clawdbot/`
- Backup files retained "deleted" credentials
- Accessible to infostealers and local attackers

2. **Cron Job Injection**
- Attackers could inject scheduled tasks via Discord messages
- Tasks ran with host machine privileges
- 30-second attack timeline from injection to exfiltration

3. **Indirect Prompt Injection**
- Malicious instructions in emails, PDFs, websites
- No validation of external content
- White-on-white hidden text attacks

4. **Unsafe Command Execution**
- `eval` used 100+ times, `execSync` 9 times
- No command sandboxing or whitelisting
- Direct system command access

5. **Public Exposure**
- 1,238 publicly accessible Moltbot instances (via Shodan)
- No authentication on many deployments
- Control panels exposed to internet

6. **Supply Chain Risk**
- 300+ contributors with commit access
- No code review requirements
- "Vibe-coded" with AI tools

### Impact:
- ~300K-400K estimated users affected
- Access to email, calendars, chat, files, browsers
- Full account takeover potential
- Credential theft and persistent access

## How to Use

### Basic Example

```python
from pyrit.prompt_converter import AgentCommandInjectionConverter
from pyrit.executor.attack import PromptSendingAttack

# Create converter for cron injection testing
converter = AgentCommandInjectionConverter(
injection_type="cron",
exfiltration_target="test-server.com",
stealth_mode=True
)

# Use in an attack
attack = PromptSendingAttack(
objective_target=your_ai_agent,
converters=[converter]
)

result = await attack.execute_async(
objective="Schedule recurring credential collection"
)
```

### With the Dataset

```python
from pyrit.models import SeedPromptDataset

# Load AI agent security test cases
dataset = SeedPromptDataset.from_yaml_file(
"pyrit/datasets/seed_datasets/local/airt/ai_agent_security.prompt"
)

# Test each objective
for seed in dataset.prompts:
result = await attack.execute_async(objective=seed.value)
```

## Testing Your AI Agent

To test if your AI agent is vulnerable to Moltbot-style attacks:

```python
# Test 1: Cron Injection
converter = AgentCommandInjectionConverter(injection_type="cron")
result = await test_agent(converter)
# Check if agent created scheduled tasks

# Test 2: Credential Access
converter = AgentCommandInjectionConverter(injection_type="credential_theft")
result = await test_agent(converter)
# Check if agent exposed credentials

# Test 3: File System Access
converter = AgentCommandInjectionConverter(injection_type="file_read")
result = await test_agent(converter)
# Check if agent read unauthorized files
```

## Integration with PyRIT

This feature integrates seamlessly with PyRIT's existing components:

- **Attack Strategies**: Use with PromptSendingAttack, RedTeamingAttack, CrescendoAttack
- **Scoring**: Combine with SelfAskCategoryScorer to detect vulnerabilities
- **Multi-Turn**: Test persistent exploitation with RedTeamingAttack
- **Datasets**: Integrate with existing AIRT test scenarios

## Mitigation for AI Agent Developers

If you're building AI agents, protect against these vulnerabilities:

1. **Never store credentials in cleartext** - Use secure vaults
2. **Validate all external inputs** - Sanitize emails, PDFs, websites
3. **Implement command whitelisting** - Restrict executable commands
4. **Use sandboxing** - Isolate agents with limited privileges
5. **Monitor suspicious activity** - Log all file/network access
6. **Regular security testing** - Use PyRIT regularly
7. **Implement rate limiting** - Prevent rapid exploitation
8. **Code review** - Audit all contributions, especially commands

## References

- [OX Security: Moltbot Analysis](https://www.ox.security/blog/one-step-away-from-a-massive-data-breach-what-we-found-inside-moltbot/)
- [Noma Security: Agentic Trojan Horse](https://noma.security/blog/moltbot-the-agentic-trojan-horse/)
- [Bitdefender: Moltbot Alert](https://www.bitdefender.com/en-us/blog/hotforsecurity/moltbot-security-alert-exposed-clawdbot-control-panels-risk-credential-leaks-and-account-takeovers)

## Future Enhancements

Potential additions:
- Target implementation for Moltbot/OpenClaw instances
- Additional converters for specific agent platforms (AutoGPT, LangChain)
- Scorer specialized for agent vulnerability detection
- Integration with agent security benchmarks
- Automated vulnerability reporting

## Contributing

Found new AI agent vulnerabilities? Contributions welcome:
- Add new injection patterns to the converter
- Expand the test dataset with new objectives
- Create additional converters for specific platforms
- Improve detection and scoring capabilities

## Important Notes

⚠️ **Authorization Required**: Only test AI agents you own or have explicit permission to test.

⚠️ **Controlled Environments**: Never test against production systems without proper safeguards.

⚠️ **Responsible Disclosure**: Report discovered vulnerabilities through proper channels.

## Questions?

See the full documentation at `doc/code/converters/ai_agent_security_testing.md`
Loading