Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/blog/2024_12_3.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ It turns out, yes, we can. `CrescendoOrchestrator`, `PairOrchestrator`, `RedTeam

We hope these changes make orchestrators significantly easier to use. With the updated documentation, the "Red Teaming Orchestrator" has been renamed "Multi-Turn Orchestrator," emphasizing that these components are now swappable. In most scenarios, you can substitute one orchestrator for another.

See the updated documentation [here](../code/executor/attack/2_red_teaming_attack.ipynb).
See the updated documentation [here](../code/executor/2_multi_turn.ipynb#red-teaming).


## What's next?
Expand Down
2 changes: 1 addition & 1 deletion doc/blog/2025_01_27.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Finally, when PyRIT gets a response from the Target LLM, it switches to another

When examining this request, you may discover that occasionally the Adversarial LLM struggles with generating the right JSON format, leading to an error in PyRIT, regardless of whether the objective was achieved or not. In such situation, it is helpful to inspect the requests to identify these types of issues. Specifically, I found a problem when the LLM response contained double quotes, causing issues with subsequent JSON formats which was fixed using the "SearchReplaceConverter"[^9] prompt converter.

[^7]: "Multi-Turn Attack - RedTeamingAttack Example", ../code/executor/attack/2_red_teaming_attack.ipynb
[^7]: "Multi-Turn Attack - RedTeamingAttack Example", ../code/executor/2_multi_turn.ipynb#red-teaming

[^8]: "PyRIT - SearchReplaceConverter", ../api/pyrit_prompt_converter.md#searchreplaceconverter

Expand Down
6 changes: 3 additions & 3 deletions doc/blog/2025_06_06.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The [AI Recruiter](https://github.com/KutalVolkan/ai_recruiter) is designed to m

- Automated RAG Vulnerability Testing: Attackers can manipulate résumé content by injecting hidden text (via a [PDF converter](../code/converters/5_file_converters.ipynb#pdfconverter)) that optimizes scoring, influencing the AI Recruiter’s ranking system.

- [XPIA Attack](https://github.com/microsoft/PyRIT/blob/main/doc/code/executor/workflow/2_xpia_ai_recruiter.ipynb) Integration: PyRIT enables full automation of prompt injections, making AI vulnerability research efficient and reproducible.
- [XPIA Attack](https://github.com/microsoft/PyRIT/blob/main/doc/code/executor/5_workflow.ipynb) Integration: PyRIT enables full automation of prompt injections, making AI vulnerability research efficient and reproducible.
---

## The Exploit in Detail: Step-by-Step
Expand Down Expand Up @@ -84,9 +84,9 @@ As we integrate AI into more facets of our lives, it’s imperative to build sys

*Explore More:*

- [XPIA Website Attack Notebook](https://github.com/microsoft/PyRIT/blob/main/doc/code/executor/workflow/1_xpia_website.ipynb)
- [XPIA Website Attack Notebook](https://github.com/microsoft/PyRIT/blob/main/doc/code/executor/5_workflow.ipynb)

- [XPIA AI Recruiter Attack Notebook](https://github.com/microsoft/PyRIT/blob/main/doc/code/executor/workflow/2_xpia_ai_recruiter.ipynb)
- [XPIA AI Recruiter Attack Notebook](https://github.com/microsoft/PyRIT/blob/main/doc/code/executor/5_workflow.ipynb)

- [View AI Recruiter Integration Test](../../tests/integration/ai_recruiter/test_ai_recruiter.py)

Expand Down
2 changes: 1 addition & 1 deletion doc/code/datasets/2_seed_programming.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"\n",
"## Translating from Seeds for Attack Parameters\n",
"\n",
"Most [attacks](../executor/attack/0_attack.md) make use of several parameters.\n",
"Most [attacks](../executor/0_executor.md) make use of several parameters.\n",
"\n",
"1. An **objective** - what you're trying to achieve\n",
"2. A **next_message** (optional) - the next message to send to the target\n",
Expand Down
2 changes: 1 addition & 1 deletion doc/code/datasets/2_seed_programming.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
#
# ## Translating from Seeds for Attack Parameters
#
# Most [attacks](../executor/attack/0_attack.md) make use of several parameters.
# Most [attacks](../executor/0_executor.md) make use of several parameters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ? relatedly should all instances of attack in this notebook be executor ?

Suggested change
# Most [attacks](../executor/0_executor.md) make use of several parameters.
# Most [executors](../executor/0_executor.md) make use of several parameters.

#
# 1. An **objective** - what you're trying to achieve
# 2. A **next_message** (optional) - the next message to send to the target
Expand Down
148 changes: 85 additions & 63 deletions doc/code/executor/0_executor.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,101 @@
# Executor

## Overview

The `pyrit/executor` module provides a flexible framework for executing various operations in PyRIT. This document explains the core components and how they are utilized across different executor categories.

## Core Components (`pyrit/executor/core`)

The core executor module contains the foundational classes and interfaces that all executor categories inherit from:

- **Strategy** (`strategy.py`): Abstract base class for strategies with enforced lifecycle management.
- **StrategyContext** (`strategy.py`): The abstract base class that manages strategy context (all data needed to successfully execute the strategy).
- **StrategyConverterConfig** (`config.py`): Configuration for prompt converters used in strategies.
- **StrategyResult** (`pyrit/models/strategy_result.py`): Base class for all strategy results.
An **executor** is an *algorithm for interacting with an objective target*. You give it an objective
and some configuration, it drives the target, and it hands back a result. That's the whole job.

The important thing to notice up front is that **not every executor is an attack**. Sending a single
adversarial prompt is an executor, but so is running a Q&A benchmark over a dataset, fuzzing to
generate new prompts, or orchestrating a cross-domain injection workflow. Attacks are the largest and
most familiar family, but every category in this section — attacks, workflows, benchmarks, and prompt
generators — is the same kind of object running the same lifecycle.

## Executor vs. attack technique

These two words get used loosely, so we pin them down:

- An **executor** (for attacks, an **attack strategy**) is the *algorithm* — e.g.
`PromptSendingAttack`, `CrescendoAttack`, `TreeOfAttacksWithPruningAttack`. It knows *how* to drive
Comment on lines +16 to +17

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's confusing that we are defining executor but only listing attacks. imo attacks vs executor should be the comparison here and we only define attack technique in the scenario md because why introduce a concept here that doesn't pertain

the objective target.
- An **[attack technique](../scenarios/0_attack_techniques.ipynb)** is anything that, once configured,
generally helps move an attack toward achieving its objective — a role-play framing, a many-shot
priming set, a particular jailbreak template. A technique is **specific to an attack**: it is one
configured executor (plus its seeds) packaged so a [scenario](../scenarios/0_scenarios.ipynb) can
select it by name. The technique is the *recipe*; the executor is the *engine* that runs it.

## Executor categories

PyRIT ships several families of executor. The cleanest way to tell the two main *attack* families
apart is to **count requests to the objective target**: a single-turn attack sends exactly one; a
multi-turn attack sends more than one and adapts as it goes.
Comment on lines +27 to +29

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing; it reads like there's several types of executors and they are these two families of attacks. I'd add just say :

Suggested change
PyRIT ships several families of executor. The cleanest way to tell the two main *attack* families
apart is to **count requests to the objective target**: a single-turn attack sends exactly one; a
multi-turn attack sends more than one and adapts as it goes.
PyRIT ships several families of executor.

and then list the executors


- **[Single-Turn](1_single_turn.ipynb)** — sends a single prompt (**one attack turn**) to the
objective target and scores the response. It may prepare that prompt elaborately (a role-play frame,
many-shot priming, a prepended conversation), but only one crafted message is the actual ask, so no
adversarial target is required to *drive* it.
- **[Multi-Turn](2_multi_turn.ipynb)** — sends **more than one** turn to the objective target,
adapting until the objective is met or a turn limit is hit. Adaptive variants use an adversarial
target to generate each next prompt from the responses; others send a fixed sequence, request the
answer in chunks, or stream input — no adversarial target needed.
- **[Attack Configuration](3_attack_configuration.ipynb)** — not an executor itself, but the
cross-cutting inputs every attack accepts (objective vs. adversarial target, prepended
conversations, multimodal seeds, next-turn messages, memory labels).
Comment on lines +39 to +41

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove since this isn't an executor (maybe the notebook order should be changed so it's not in the middle of all the executors too)

- **[Compound](4_compound.ipynb)** — doesn't add turns of its own; it orchestrates *other* attacks
(running them in sequence) toward a single objective, after the building blocks it composes.
- **[Workflow](5_workflow.ipynb)** — generic multi-step orchestration that doesn't fit the
attack/benchmark mould (e.g. cross-domain prompt injection / XPIA).
- **[Benchmark](6_benchmark.ipynb)** — evaluates an objective target against a fixed dataset and
criteria (e.g. Q&A accuracy, bias).
- **[Prompt Generator](7_promptgen.ipynb)** — produces attack prompts (e.g. fuzzing, Anecdoctor) to
augment datasets; some generate from a model alone, others probe a target to evolve effective
prompts.

## The shape of an attack

Attacks — the most common executors — share a 4-component shape:

```{mermaid}
flowchart LR
A(["Strategy"])
A --consumes--> B(["Strategy Context"])
A --takes in as parameters within __init__--> D(["Strategy Configurations (e.g. Converters)"])
A --produces--> C(["Strategy Result <br>"])
A(["Attack Strategy"])
A --consumes--> B(["Attack Context <br>(objective, labels, prepended conversation)"])
A --configured by--> D(["Attack Configurations <br>(Adversarial, Scoring, Converter)"])
A --produces--> C(["Attack Result"])
```

To execute, one generally follows this pattern:
1. Create an **strategy context** containing state information
2. Initialize a **strategy** (with optional **configurations** for converters etc.)
3. _Execute_ the attack strategy with the created context
4. Receive and process the **strategy result**

Each attack implements a lifecycle with distinct phases (all abstract methods), and the `Strategy` class provides a non-abstract `execute_async()` method that enforces this lifecycle:
* `_validate_context`: Validate context
* `_setup_async`: Initialize state
* `_perform_async`: Execute the core logic
* `_teardown_async`: Clean up resources

This implementation enforces a consistent execution flow across all strategies by:
1. Guaranteeing that setup is always performed before the attack begins
2. Ensuring the attack logic is only executed if setup succeeds
3. Guaranteeing teardown is always executed, even if errors occur, through the use of a finally block
4. Providing centralized error handling and logging

## Executor Categories

All of these categories follow the flow of control described above.

### Attack (`pyrit/executor/attack`)

Attacks implement various adversarial testing strategies to send prompts to a target endpoint, evaluate the responses, and report on the success of the attack.

- **Single-Turn Attacks**: Single-turn attacks typically send prompts to a target endpoint to try to achieve a specific objective within a single turn. These attack strategies evaluate the target response using optional scorers to determine if the objective has been met.
- **Multi-Turn Attacks**: Multi-turn attacks introduce an iterative attack process where an adversarial chat model generates prompts to send to a target system, attempting to achieve a specified objective over multiple turns. This strategy also evaluates the response using a scorer to determine if the objective has been met. These attacks continue iterating until the objective is met or a maximum numbers of turns is attempted. These types of attacks tend to work better than single-turn attacks in eliciting harm if a target endpoint keeps track of conversation history.

Read more about the Attack architecture [here](../executor/attack/0_attack.md)

### Prompt Generator (`pyrit/executor/promptgen`)

Prompt generators create various types of prompts using different strategies. Some examples are:

- **Fuzzer Generator**: Generates diverse jailbreak prompts by systematically exploring and generating prompt templates using the Monte Carlo Tree Search to balance exploration of new templates with exploitation of promising ones.
- **Anecdoctor Generator**: Generates misinformation content by using few-shot examples directly or by extracting a knowledge graph from examples, then using it.

Read more about Prompt Generators [here](../executor/promptgen/0_promptgen.md)
To run one:

### Workflow (`pyrit/executor/workflow`)
1. Initialize a **strategy** with optional **configurations** (converters, scorers, adversarial target).
2. Call `execute_async(...)` with an **objective** (and optional prepended conversation / next message).
3. Receive an **`AttackResult`** describing what happened and whether the objective was met.

Workflows orchestrate complex multi-step operations. Examples include:
The context is created for you from the `execute_async` arguments — you rarely build one by hand.
See [Attack Configuration](3_attack_configuration.ipynb) for what you can put in the context and
configs (prepended conversations, multimodal seeds, next-turn messages, memory labels).

- **XPIA Workflow**: This workflow orchestrates an cross prompt-injection attack (XPIA), where one might hide a prompt injection within a website or PDF and ask a target system to evaluate the contents to trigger the prompt injection.
The category pages above each walk through their executors with short runnable examples.

Read more about Workflows [here](../executor/workflow/0_workflow.md)
## When do you actually need a new executor class?

Most of an executor's behavior comes from its *configuration and data*, not from new code. So before
writing a new executor class, ask whether the algorithm is genuinely new — or whether an existing
executor with different primitives would do.

### Benchmark (`pyrit/executor/benchmark`)
For attacks specifically, the durable value of a new class is **adaptive decision-making**: branching
and backtracking based on the objective target's feedback, like searching a graph for a path that
works. Crescendo and TAP are the clearest examples — and you can reshape them substantially just by
swapping their *primitives* (system prompt, converters, scorers, prepended/simulated conversations)
rather than writing a new class.

Benchmarks evaluate model performance and safety based off of specific criteria. Examples include:
A lot of what *looks* like a distinct executor isn't a new algorithm at all:

- **Question Answering Benchmark**: This benchmark strategy evaluates target models by sending multiple choice questions as prompts and seeing how accurately the model answers those questions. The responses are evaluated for benchmark reporting.
- **Pure prompt transformations** — obfuscating, or deconstructing-and-reconstructing a prompt — are
better expressed as [converters](../converters/0_converters.ipynb) than as attack classes.
- **Fixed framings** — a role-play wrapper, a primed Q&A history — are really a prepended conversation
plus seeds, i.e. an [attack technique](../scenarios/0_attack_techniques.ipynb) over an existing
attack like `PromptSendingAttack`.
- **New datasets or criteria** — a different benchmark question set or a different scorer is data and
configuration for an existing executor, not a new class.

Read more about Benchmarks [here](../executor/benchmark/0_benchmark.md)
Several of the single-turn attacks in this section predate this guidance and remain as classes for
compatibility. When you are building something new, prefer configuration, a converter, or a technique —
reach for a new executor class only when you genuinely need a new algorithm (most often a
feedback-driven loop).
Loading
Loading