-
Notifications
You must be signed in to change notification settings - Fork 782
DOC: Executor docs rewrite #1979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ff346ea
f453bce
f95a918
3b3f966
367dee9
994ff42
f7600ae
7012b72
942255d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,79 +1,101 @@ | ||||||||||
| # Executor | ||||||||||
|
|
||||||||||
| ## Overview | ||||||||||
|
|
||||||||||
| The `pyrit/executor` module provides a flexible framework for executing various operations in PyRIT. This document explains the core components and how they are utilized across different executor categories. | ||||||||||
|
|
||||||||||
| ## Core Components (`pyrit/executor/core`) | ||||||||||
|
|
||||||||||
| The core executor module contains the foundational classes and interfaces that all executor categories inherit from: | ||||||||||
|
|
||||||||||
| - **Strategy** (`strategy.py`): Abstract base class for strategies with enforced lifecycle management. | ||||||||||
| - **StrategyContext** (`strategy.py`): The abstract base class that manages strategy context (all data needed to successfully execute the strategy). | ||||||||||
| - **StrategyConverterConfig** (`config.py`): Configuration for prompt converters used in strategies. | ||||||||||
| - **StrategyResult** (`pyrit/models/strategy_result.py`): Base class for all strategy results. | ||||||||||
| An **executor** is an *algorithm for interacting with an objective target*. You give it an objective | ||||||||||
| and some configuration, it drives the target, and it hands back a result. That's the whole job. | ||||||||||
|
|
||||||||||
| The important thing to notice up front is that **not every executor is an attack**. Sending a single | ||||||||||
| adversarial prompt is an executor, but so is running a Q&A benchmark over a dataset, fuzzing to | ||||||||||
| generate new prompts, or orchestrating a cross-domain injection workflow. Attacks are the largest and | ||||||||||
| most familiar family, but every category in this section — attacks, workflows, benchmarks, and prompt | ||||||||||
| generators — is the same kind of object running the same lifecycle. | ||||||||||
|
|
||||||||||
| ## Executor vs. attack technique | ||||||||||
|
|
||||||||||
| These two words get used loosely, so we pin them down: | ||||||||||
|
|
||||||||||
| - An **executor** (for attacks, an **attack strategy**) is the *algorithm* — e.g. | ||||||||||
| `PromptSendingAttack`, `CrescendoAttack`, `TreeOfAttacksWithPruningAttack`. It knows *how* to drive | ||||||||||
|
Comment on lines
+16
to
+17
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think it's confusing that we are defining executor but only listing attacks. imo attacks vs executor should be the comparison here and we only define attack technique in the scenario md because why introduce a concept here that doesn't pertain |
||||||||||
| the objective target. | ||||||||||
| - An **[attack technique](../scenarios/0_attack_techniques.ipynb)** is anything that, once configured, | ||||||||||
| generally helps move an attack toward achieving its objective — a role-play framing, a many-shot | ||||||||||
| priming set, a particular jailbreak template. A technique is **specific to an attack**: it is one | ||||||||||
| configured executor (plus its seeds) packaged so a [scenario](../scenarios/0_scenarios.ipynb) can | ||||||||||
| select it by name. The technique is the *recipe*; the executor is the *engine* that runs it. | ||||||||||
|
|
||||||||||
| ## Executor categories | ||||||||||
|
|
||||||||||
| PyRIT ships several families of executor. The cleanest way to tell the two main *attack* families | ||||||||||
| apart is to **count requests to the objective target**: a single-turn attack sends exactly one; a | ||||||||||
| multi-turn attack sends more than one and adapts as it goes. | ||||||||||
|
Comment on lines
+27
to
+29
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is confusing; it reads like there's several types of executors and they are these two families of attacks. I'd add just say :
Suggested change
and then list the executors |
||||||||||
|
|
||||||||||
| - **[Single-Turn](1_single_turn.ipynb)** — sends a single prompt (**one attack turn**) to the | ||||||||||
| objective target and scores the response. It may prepare that prompt elaborately (a role-play frame, | ||||||||||
| many-shot priming, a prepended conversation), but only one crafted message is the actual ask, so no | ||||||||||
| adversarial target is required to *drive* it. | ||||||||||
| - **[Multi-Turn](2_multi_turn.ipynb)** — sends **more than one** turn to the objective target, | ||||||||||
| adapting until the objective is met or a turn limit is hit. Adaptive variants use an adversarial | ||||||||||
| target to generate each next prompt from the responses; others send a fixed sequence, request the | ||||||||||
| answer in chunks, or stream input — no adversarial target needed. | ||||||||||
| - **[Attack Configuration](3_attack_configuration.ipynb)** — not an executor itself, but the | ||||||||||
| cross-cutting inputs every attack accepts (objective vs. adversarial target, prepended | ||||||||||
| conversations, multimodal seeds, next-turn messages, memory labels). | ||||||||||
|
Comment on lines
+39
to
+41
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: remove since this isn't an executor (maybe the notebook order should be changed so it's not in the middle of all the executors too) |
||||||||||
| - **[Compound](4_compound.ipynb)** — doesn't add turns of its own; it orchestrates *other* attacks | ||||||||||
| (running them in sequence) toward a single objective, after the building blocks it composes. | ||||||||||
| - **[Workflow](5_workflow.ipynb)** — generic multi-step orchestration that doesn't fit the | ||||||||||
| attack/benchmark mould (e.g. cross-domain prompt injection / XPIA). | ||||||||||
| - **[Benchmark](6_benchmark.ipynb)** — evaluates an objective target against a fixed dataset and | ||||||||||
| criteria (e.g. Q&A accuracy, bias). | ||||||||||
| - **[Prompt Generator](7_promptgen.ipynb)** — produces attack prompts (e.g. fuzzing, Anecdoctor) to | ||||||||||
| augment datasets; some generate from a model alone, others probe a target to evolve effective | ||||||||||
| prompts. | ||||||||||
|
|
||||||||||
| ## The shape of an attack | ||||||||||
|
|
||||||||||
| Attacks — the most common executors — share a 4-component shape: | ||||||||||
|
|
||||||||||
| ```{mermaid} | ||||||||||
| flowchart LR | ||||||||||
| A(["Strategy"]) | ||||||||||
| A --consumes--> B(["Strategy Context"]) | ||||||||||
| A --takes in as parameters within __init__--> D(["Strategy Configurations (e.g. Converters)"]) | ||||||||||
| A --produces--> C(["Strategy Result <br>"]) | ||||||||||
| A(["Attack Strategy"]) | ||||||||||
| A --consumes--> B(["Attack Context <br>(objective, labels, prepended conversation)"]) | ||||||||||
| A --configured by--> D(["Attack Configurations <br>(Adversarial, Scoring, Converter)"]) | ||||||||||
| A --produces--> C(["Attack Result"]) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| To execute, one generally follows this pattern: | ||||||||||
| 1. Create an **strategy context** containing state information | ||||||||||
| 2. Initialize a **strategy** (with optional **configurations** for converters etc.) | ||||||||||
| 3. _Execute_ the attack strategy with the created context | ||||||||||
| 4. Receive and process the **strategy result** | ||||||||||
|
|
||||||||||
| Each attack implements a lifecycle with distinct phases (all abstract methods), and the `Strategy` class provides a non-abstract `execute_async()` method that enforces this lifecycle: | ||||||||||
| * `_validate_context`: Validate context | ||||||||||
| * `_setup_async`: Initialize state | ||||||||||
| * `_perform_async`: Execute the core logic | ||||||||||
| * `_teardown_async`: Clean up resources | ||||||||||
|
|
||||||||||
| This implementation enforces a consistent execution flow across all strategies by: | ||||||||||
| 1. Guaranteeing that setup is always performed before the attack begins | ||||||||||
| 2. Ensuring the attack logic is only executed if setup succeeds | ||||||||||
| 3. Guaranteeing teardown is always executed, even if errors occur, through the use of a finally block | ||||||||||
| 4. Providing centralized error handling and logging | ||||||||||
|
|
||||||||||
| ## Executor Categories | ||||||||||
|
|
||||||||||
| All of these categories follow the flow of control described above. | ||||||||||
|
|
||||||||||
| ### Attack (`pyrit/executor/attack`) | ||||||||||
|
|
||||||||||
| Attacks implement various adversarial testing strategies to send prompts to a target endpoint, evaluate the responses, and report on the success of the attack. | ||||||||||
|
|
||||||||||
| - **Single-Turn Attacks**: Single-turn attacks typically send prompts to a target endpoint to try to achieve a specific objective within a single turn. These attack strategies evaluate the target response using optional scorers to determine if the objective has been met. | ||||||||||
| - **Multi-Turn Attacks**: Multi-turn attacks introduce an iterative attack process where an adversarial chat model generates prompts to send to a target system, attempting to achieve a specified objective over multiple turns. This strategy also evaluates the response using a scorer to determine if the objective has been met. These attacks continue iterating until the objective is met or a maximum numbers of turns is attempted. These types of attacks tend to work better than single-turn attacks in eliciting harm if a target endpoint keeps track of conversation history. | ||||||||||
|
|
||||||||||
| Read more about the Attack architecture [here](../executor/attack/0_attack.md) | ||||||||||
|
|
||||||||||
| ### Prompt Generator (`pyrit/executor/promptgen`) | ||||||||||
|
|
||||||||||
| Prompt generators create various types of prompts using different strategies. Some examples are: | ||||||||||
|
|
||||||||||
| - **Fuzzer Generator**: Generates diverse jailbreak prompts by systematically exploring and generating prompt templates using the Monte Carlo Tree Search to balance exploration of new templates with exploitation of promising ones. | ||||||||||
| - **Anecdoctor Generator**: Generates misinformation content by using few-shot examples directly or by extracting a knowledge graph from examples, then using it. | ||||||||||
|
|
||||||||||
| Read more about Prompt Generators [here](../executor/promptgen/0_promptgen.md) | ||||||||||
| To run one: | ||||||||||
|
|
||||||||||
| ### Workflow (`pyrit/executor/workflow`) | ||||||||||
| 1. Initialize a **strategy** with optional **configurations** (converters, scorers, adversarial target). | ||||||||||
| 2. Call `execute_async(...)` with an **objective** (and optional prepended conversation / next message). | ||||||||||
| 3. Receive an **`AttackResult`** describing what happened and whether the objective was met. | ||||||||||
|
|
||||||||||
| Workflows orchestrate complex multi-step operations. Examples include: | ||||||||||
| The context is created for you from the `execute_async` arguments — you rarely build one by hand. | ||||||||||
| See [Attack Configuration](3_attack_configuration.ipynb) for what you can put in the context and | ||||||||||
| configs (prepended conversations, multimodal seeds, next-turn messages, memory labels). | ||||||||||
|
|
||||||||||
| - **XPIA Workflow**: This workflow orchestrates an cross prompt-injection attack (XPIA), where one might hide a prompt injection within a website or PDF and ask a target system to evaluate the contents to trigger the prompt injection. | ||||||||||
| The category pages above each walk through their executors with short runnable examples. | ||||||||||
|
|
||||||||||
| Read more about Workflows [here](../executor/workflow/0_workflow.md) | ||||||||||
| ## When do you actually need a new executor class? | ||||||||||
|
|
||||||||||
| Most of an executor's behavior comes from its *configuration and data*, not from new code. So before | ||||||||||
| writing a new executor class, ask whether the algorithm is genuinely new — or whether an existing | ||||||||||
| executor with different primitives would do. | ||||||||||
|
|
||||||||||
| ### Benchmark (`pyrit/executor/benchmark`) | ||||||||||
| For attacks specifically, the durable value of a new class is **adaptive decision-making**: branching | ||||||||||
| and backtracking based on the objective target's feedback, like searching a graph for a path that | ||||||||||
| works. Crescendo and TAP are the clearest examples — and you can reshape them substantially just by | ||||||||||
| swapping their *primitives* (system prompt, converters, scorers, prepended/simulated conversations) | ||||||||||
| rather than writing a new class. | ||||||||||
|
|
||||||||||
| Benchmarks evaluate model performance and safety based off of specific criteria. Examples include: | ||||||||||
| A lot of what *looks* like a distinct executor isn't a new algorithm at all: | ||||||||||
|
|
||||||||||
| - **Question Answering Benchmark**: This benchmark strategy evaluates target models by sending multiple choice questions as prompts and seeing how accurately the model answers those questions. The responses are evaluated for benchmark reporting. | ||||||||||
| - **Pure prompt transformations** — obfuscating, or deconstructing-and-reconstructing a prompt — are | ||||||||||
| better expressed as [converters](../converters/0_converters.ipynb) than as attack classes. | ||||||||||
| - **Fixed framings** — a role-play wrapper, a primed Q&A history — are really a prepended conversation | ||||||||||
| plus seeds, i.e. an [attack technique](../scenarios/0_attack_techniques.ipynb) over an existing | ||||||||||
| attack like `PromptSendingAttack`. | ||||||||||
| - **New datasets or criteria** — a different benchmark question set or a different scorer is data and | ||||||||||
| configuration for an existing executor, not a new class. | ||||||||||
|
|
||||||||||
| Read more about Benchmarks [here](../executor/benchmark/0_benchmark.md) | ||||||||||
| Several of the single-turn attacks in this section predate this guidance and remain as classes for | ||||||||||
| compatibility. When you are building something new, prefer configuration, a converter, or a technique — | ||||||||||
| reach for a new executor class only when you genuinely need a new algorithm (most often a | ||||||||||
| feedback-driven loop). | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be ? relatedly should all instances of attack in this notebook be executor ?