Skip to content

FEAT: Validate button for live target capability checks in CoPyRIT#1996

Open
varunj-msft wants to merge 17 commits into
microsoft:mainfrom
varunj-msft:varunj-msft/Hackathon-GUI-Validate-Function
Open

FEAT: Validate button for live target capability checks in CoPyRIT#1996
varunj-msft wants to merge 17 commits into
microsoft:mainfrom
varunj-msft:varunj-msft/Hackathon-GUI-Validate-Function

Conversation

@varunj-msft

Copy link
Copy Markdown
Contributor

Description

Adds a Validate column to the CoPyRIT target table. Clicking the beaker button on a row runs PyRIT's existing discover_target_capabilities_async engine against that target and opens a modal showing a declared-vs-observed capability diff (boolean flags + input modalities).

The engine already shipped but had no user-facing surface, so users only discovered capability drift (e.g., an Azure OpenAI gateway stripping JSON-schema, a multimodal class pointed at a text-only deployment) when an attack failed mid-run. This makes it a one-click check.

Read-only — no apply, no persistence beyond the modal session. Out of scope: applying observed capabilities back to the target, drift history, scheduled validation, memory-row filtering for probe writes, per-inner-target validation on composite targets.

Backend:

  • New ValidateCapabilitiesResponse model and POST /api/targets/{name}/validate route.
  • New TargetService.validate_target_capabilities_async method that filters declared input modalities to the probeable subset (text, image_path, audio_path) before invoking the engine, caps per_probe_timeout_s at 15s for GUI use, and holds a per-target asyncio.Lock so concurrent clicks on the same target serialize cleanly.
  • Promoted _target_capabilities_to_info to public (now has two consumers).

Frontend:

  • New ValidateCapabilitiesDialog showing a declared/observed/match table, a "Not probed (no asset)" row for modalities the engine has no test asset for (function_call, tool_call, reasoning, etc.), and warnings about live calls, memory writes, and validate-vs-active-attack races.
  • Validate column placed next to the capability flags rather than in the leftmost cell, with a bordered icon button so it reads clearly as an action

Known limitation: image-modality probing currently false-negatives on many targets because the packaged probe_image.png asset (pyrit/datasets/prompt_target/target_capabilities/probe_image.png) is a 68-byte file that fails PIL.Image.verify(). Not introduced by this PR — file unchanged — but worth flagging. The dialog surfaces this in a warning so users know to verify image results manually.

Tests and Documentation

Backend:

  • New unit tests in tests/unit/backend/test_target_service.py (15 new tests) covering: probeable-modality filtering, the empty-set path, per-target lock serialization, cross-target non-serialization, exception propagation, GUI default timeout, the mixed-combo non_probeable_only_types regression guard, and warning contents.
  • New route tests in tests/unit/backend/test_api_routes.py for the 200/404 paths.
  • Full backend suite: 9638 passed, 43 skipped, project coverage 90.24% (gate 78%). Diff-cover 98% on the 262 changed lines (gate 90%).

Frontend:

  • New ValidateCapabilitiesDialog.test.tsx (17 tests) and additions to TargetTable.test.tsx (5 tests) covering button placement, inner-target exclusion, dialog open/close, state reset across targets, the mixed-combo cell-filter regression, the composite-target warning, and the "Not probed" row.
  • Full frontend suite: 669 passed across 29 suites, Jest coverage gate met.

Documentation:

  • Added a short "Validating Targets" section to doc/gui/0_gui.md describing the column placement, what the dialog shows, and the limitations users should know about.
  • JupyText: no notebook changes in this PR, so no JupyText regeneration needed.

varunj-msft and others added 17 commits June 10, 2026 21:22
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5s caused false-negative mismatches against cold-started Azure
targets (multi_turn flakes, cascading to editable_history). 15s
gives enough headroom for cold starts while remaining interactive.
Verified live: 5s flaked multi_turn=False, 15s probe returns the
correct multi_turn=True against azure_openai_responses.
The engine ORs non-probeable combinations back into observed.input_modalities
(discover_target_capabilities.py:778), which made the dialog show the same
types in both the Observed cell and the 'Not probed (no asset)' row —
a contradiction (claims confirmed AND claims not-probed for the same type).

Filter the non-probeable types out of both Declared and Observed cells in
the Input modalities row so the cells show only what was actually probed.
The 'Not probed' row below already lists them separately — no info lost.

Regression-guarded by an updated F6 test that asserts function_call appears
exactly twice on screen (Not-probed row + warning text), not three times
(which would mean it leaked back into the Input modalities cells).
Validation surfaces request-acceptance (not enforcement) AND inherits any
bugs in the probe engine. Two known bugs cause false negatives today:
the packaged probe_image.png is corrupt, and OpenAI Responses API image
payloads have a known engine format mismatch. Both make image_path show
'observed=no' on targets that actually support image input.

Add a prominent warning banner at the top of every result so users know
not to treat the diff as ground truth. When the listed engine bugs are
fixed, drop the parenthetical.
Previous banner was bulky and read like a self-own — surfacing our own
engine bugs at the top of the dialog before the user had even seen the
data. Replaced with: nothing in the result header (removed banner), plus
a short addition to the existing 'request acceptance, not semantic
enforcement' warning to mention image probes may currently false-negative.

This keeps the engine-bug context where it belongs (one warning among
several, framed as a property of probing rather than a flaw of this UI)
without making the dialog look unconfident on first impression.
…mbos

ValidateCapabilitiesDialog.tsx flattened non_probeable_input_modalities
by splitting each combo string on '+' and unioning the pieces. For a
target declaring both {text} (probeable) and {text, function_call}
(non-probeable), the resulting set stripped both text and function_call
from the Input modalities cells — making confirmed text invisible and
the row render as '— / — / green match' despite text having been probed
and confirmed.

No in-tree target currently declares such a mixed combo, so this bug
was latent. It would surface the moment any non-OpenAI multi-piece
target lands.

Fix: backend computes and emits non_probeable_only_types — the types
that appear ONLY in non-probeable combos (never in any probeable one).
Frontend uses that for the cell-hide set. non_probeable_input_modalities
is unchanged and continues to drive the 'Not probed (no asset)' row
display.

Regression tests on both sides:
- test_non_probeable_only_types_excludes_types_confirmed_via_probeable_combo
  asserts a target with both a probeable singleton and a non-probeable
  mixed combo reports the bridging type as confirmed-probeable.
- A frontend test asserts the Input modalities row keeps 'text' and
  excludes 'function_call' when given mixed-combo data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ster

Per Roman's feedback: putting Validate next to Set Active in the leftmost
cell stacked two unrelated actions and crowded the row left edge. The
button belongs next to the data it inspects.

Changes:
- New 'Validate' column inserted between 'Outputs' and 'Multi-turn', with
  a header tooltip explaining what the action does.
- Each top-level row gets a subtle icon button (BeakerRegular) in the new
  column with aria-label='Validate capabilities for {target_registry_name}',
  wrapped in a Tooltip carrying the same description.
- Leftmost cell now contains only Set Active / Active badge, restoring
  the row to a single-line action cell.
- Updated 5 F5 tests to find buttons via the new aria-label regex.
- Updated doc/gui/0_gui.md to describe the new column placement.

No behavior change: the same dialog opens with the same payload; the
disable-during-active-dialog and inner-target exclusion rules still hold.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r feedback

Roman flagged the subtle (icon-only) variant as visually indistinguishable
from the modality icons one column over — readers couldn't tell it was a
button. Switched to appearance='secondary' so it has a clear border and
hover state, matching the affordance of 'Set Active' while staying gray
to differentiate from the blue primary action. Widened the column from
70px to 90px to give the bordered button breathing room.

No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI lints with the merged state of (PR branch + origin/main), and main
upgraded eslint-plugin-react-hooks to v7 (PR microsoft#1984) which adds the
`set-state-in-effect` rule. My branch's package.json still pins v5
locally so this wasn't caught pre-push, but the merge resolution at
CI time gives v7 and the rule fires on the synchronous
setLoading(true)/setError(null)/setResult(null) calls at the top of
my useEffect body.

Refactor: tag the cached result+error with the target name they were
requested for (`requestedFor` state), then derive `loading`,
`displayResult`, and `displayError` from current target vs that tag.
Switching targets makes the prior tag no longer match, so the display
reverts to the spinner without any synchronous state mutation inside
the effect. Same-target reopen still needs the explicit reset in
handleClose to re-fire the effect (the [open, name] deps tuple is
identical across close→reopen).

All 39 dialog+table tests still pass; full frontend suite 669/669
still green. Verified the fix against v7 locally by temporarily
upgrading the lockfile and running eslint on just my changed files —
clean. Reverted the lockfile bump because the upgrade belongs to
PR microsoft#1984, not this PR; the merge will combine main's v7 plugin with
my source-level fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant