FEAT: Validate button for live target capability checks in CoPyRIT#1996
Open
varunj-msft wants to merge 17 commits into
Open
FEAT: Validate button for live target capability checks in CoPyRIT#1996varunj-msft wants to merge 17 commits into
varunj-msft wants to merge 17 commits into
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5s caused false-negative mismatches against cold-started Azure targets (multi_turn flakes, cascading to editable_history). 15s gives enough headroom for cold starts while remaining interactive. Verified live: 5s flaked multi_turn=False, 15s probe returns the correct multi_turn=True against azure_openai_responses.
The engine ORs non-probeable combinations back into observed.input_modalities (discover_target_capabilities.py:778), which made the dialog show the same types in both the Observed cell and the 'Not probed (no asset)' row — a contradiction (claims confirmed AND claims not-probed for the same type). Filter the non-probeable types out of both Declared and Observed cells in the Input modalities row so the cells show only what was actually probed. The 'Not probed' row below already lists them separately — no info lost. Regression-guarded by an updated F6 test that asserts function_call appears exactly twice on screen (Not-probed row + warning text), not three times (which would mean it leaked back into the Input modalities cells).
Validation surfaces request-acceptance (not enforcement) AND inherits any bugs in the probe engine. Two known bugs cause false negatives today: the packaged probe_image.png is corrupt, and OpenAI Responses API image payloads have a known engine format mismatch. Both make image_path show 'observed=no' on targets that actually support image input. Add a prominent warning banner at the top of every result so users know not to treat the diff as ground truth. When the listed engine bugs are fixed, drop the parenthetical.
Previous banner was bulky and read like a self-own — surfacing our own engine bugs at the top of the dialog before the user had even seen the data. Replaced with: nothing in the result header (removed banner), plus a short addition to the existing 'request acceptance, not semantic enforcement' warning to mention image probes may currently false-negative. This keeps the engine-bug context where it belongs (one warning among several, framed as a property of probing rather than a flaw of this UI) without making the dialog look unconfident on first impression.
…mbos
ValidateCapabilitiesDialog.tsx flattened non_probeable_input_modalities
by splitting each combo string on '+' and unioning the pieces. For a
target declaring both {text} (probeable) and {text, function_call}
(non-probeable), the resulting set stripped both text and function_call
from the Input modalities cells — making confirmed text invisible and
the row render as '— / — / green match' despite text having been probed
and confirmed.
No in-tree target currently declares such a mixed combo, so this bug
was latent. It would surface the moment any non-OpenAI multi-piece
target lands.
Fix: backend computes and emits non_probeable_only_types — the types
that appear ONLY in non-probeable combos (never in any probeable one).
Frontend uses that for the cell-hide set. non_probeable_input_modalities
is unchanged and continues to drive the 'Not probed (no asset)' row
display.
Regression tests on both sides:
- test_non_probeable_only_types_excludes_types_confirmed_via_probeable_combo
asserts a target with both a probeable singleton and a non-probeable
mixed combo reports the bridging type as confirmed-probeable.
- A frontend test asserts the Input modalities row keeps 'text' and
excludes 'function_call' when given mixed-combo data.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ster
Per Roman's feedback: putting Validate next to Set Active in the leftmost
cell stacked two unrelated actions and crowded the row left edge. The
button belongs next to the data it inspects.
Changes:
- New 'Validate' column inserted between 'Outputs' and 'Multi-turn', with
a header tooltip explaining what the action does.
- Each top-level row gets a subtle icon button (BeakerRegular) in the new
column with aria-label='Validate capabilities for {target_registry_name}',
wrapped in a Tooltip carrying the same description.
- Leftmost cell now contains only Set Active / Active badge, restoring
the row to a single-line action cell.
- Updated 5 F5 tests to find buttons via the new aria-label regex.
- Updated doc/gui/0_gui.md to describe the new column placement.
No behavior change: the same dialog opens with the same payload; the
disable-during-active-dialog and inner-target exclusion rules still hold.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r feedback Roman flagged the subtle (icon-only) variant as visually indistinguishable from the modality icons one column over — readers couldn't tell it was a button. Switched to appearance='secondary' so it has a clear border and hover state, matching the affordance of 'Set Active' while staying gray to differentiate from the blue primary action. Widened the column from 70px to 90px to give the bordered button breathing room. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI lints with the merged state of (PR branch + origin/main), and main upgraded eslint-plugin-react-hooks to v7 (PR microsoft#1984) which adds the `set-state-in-effect` rule. My branch's package.json still pins v5 locally so this wasn't caught pre-push, but the merge resolution at CI time gives v7 and the rule fires on the synchronous setLoading(true)/setError(null)/setResult(null) calls at the top of my useEffect body. Refactor: tag the cached result+error with the target name they were requested for (`requestedFor` state), then derive `loading`, `displayResult`, and `displayError` from current target vs that tag. Switching targets makes the prior tag no longer match, so the display reverts to the spinner without any synchronous state mutation inside the effect. Same-target reopen still needs the explicit reset in handleClose to re-fire the effect (the [open, name] deps tuple is identical across close→reopen). All 39 dialog+table tests still pass; full frontend suite 669/669 still green. Verified the fix against v7 locally by temporarily upgrading the lockfile and running eslint on just my changed files — clean. Reverted the lockfile bump because the upgrade belongs to PR microsoft#1984, not this PR; the merge will combine main's v7 plugin with my source-level fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a Validate column to the CoPyRIT target table. Clicking the beaker button on a row runs PyRIT's existing
discover_target_capabilities_asyncengine against that target and opens a modal showing a declared-vs-observed capability diff (boolean flags + input modalities).The engine already shipped but had no user-facing surface, so users only discovered capability drift (e.g., an Azure OpenAI gateway stripping JSON-schema, a multimodal class pointed at a text-only deployment) when an attack failed mid-run. This makes it a one-click check.
Read-only — no apply, no persistence beyond the modal session. Out of scope: applying observed capabilities back to the target, drift history, scheduled validation, memory-row filtering for probe writes, per-inner-target validation on composite targets.
Backend:
ValidateCapabilitiesResponsemodel andPOST /api/targets/{name}/validateroute.TargetService.validate_target_capabilities_asyncmethod that filters declared input modalities to the probeable subset (text,image_path,audio_path) before invoking the engine, capsper_probe_timeout_sat 15s for GUI use, and holds a per-targetasyncio.Lockso concurrent clicks on the same target serialize cleanly._target_capabilities_to_infoto public (now has two consumers).Frontend:
ValidateCapabilitiesDialogshowing a declared/observed/match table, a "Not probed (no asset)" row for modalities the engine has no test asset for (function_call,tool_call,reasoning, etc.), and warnings about live calls, memory writes, and validate-vs-active-attack races.Known limitation: image-modality probing currently false-negatives on many targets because the packaged
probe_image.pngasset (pyrit/datasets/prompt_target/target_capabilities/probe_image.png) is a 68-byte file that failsPIL.Image.verify(). Not introduced by this PR — file unchanged — but worth flagging. The dialog surfaces this in a warning so users know to verify image results manually.Tests and Documentation
Backend:
tests/unit/backend/test_target_service.py(15 new tests) covering: probeable-modality filtering, the empty-set path, per-target lock serialization, cross-target non-serialization, exception propagation, GUI default timeout, the mixed-combonon_probeable_only_typesregression guard, and warning contents.tests/unit/backend/test_api_routes.pyfor the 200/404 paths.Frontend:
ValidateCapabilitiesDialog.test.tsx(17 tests) and additions toTargetTable.test.tsx(5 tests) covering button placement, inner-target exclusion, dialog open/close, state reset across targets, the mixed-combo cell-filter regression, the composite-target warning, and the "Not probed" row.Documentation:
doc/gui/0_gui.mddescribing the column placement, what the dialog shows, and the limitations users should know about.