Skip to content

ACP: Claude Code Usage Policy refusals cause instant failures on 8 Gaia instances #495

@simonrosenberg

Description

@simonrosenberg

Parent issue: #494

Bug

All 8 Gaia error instances fail with an instant Usage Policy refusal (~2 seconds) from Claude Code. The agent never starts working.

ACP error: Internal error: API Error: Claude Code is unable to respond to this
request, which appears to violate our Usage Policy
(https://www.anthropic.com/legal/aup). Try rephrasing the request or attempting
a different approach.

Details

  • Run: 22836395400-claude-son (Gaia)
  • Model: litellm_proxy/claude-sonnet-4-5-20250929
  • Agent type: acp-claude
  • Failure pattern: Instant (~2s), deterministic, identical across all 4 retries

Timing evidence (instance 2a649bb1):

03:07:01 - run() triggered successfully: <Response [200 OK]>
03:07:03 - runtime init failure ... error=Conversation run failed

The GAIA prompt contains language like "Failure or 'I cannot answer' will not be tolerated, success will be rewarded" which may trigger Claude Code's safety filters.

Affected Instance IDs

  • 2a649bb1-795f-4a01-b3be-9a01868dae73
  • 983bba7c-c092-455f-b6c9-7857003d48fc
  • 2d83110e-a098-4ebb-9987-066c06fa42d0
  • ed58682d-bc52-4baa-9eb0-4eb81e1edacc
  • 624cbf11-6a41-4692-af9c-36b3e5ca3130
  • 384d0dd8-e8a4-4cfe-963c-d37f256e7662
  • 8b3379c0-0981-4f5b-8407-6444610cb212
  • 6359a0b1-8f7b-499b-9336-840f9ab90688

Note: Some instances also exhibit the 1800s timeout pattern on earlier retries (e.g., 624cbf11 retry 1 timed out at 30min, while retries 2-4 showed the AUP refusal).

Suggestions

  1. Adjust the GAIA prompt to avoid triggering safety filters
  2. Detect AUP refusal errors and skip retries (deterministic failure, retrying is wasted)
  3. Propagate the specific error type to output_errors.jsonl instead of generic "Remote conversation ended with error"

Reproduction

gs://openhands-evaluation-results/gaia/litellm_proxy-claude-sonnet-4-5-20250929/22836395400/results.tar.gz

Log: logs/instance_2a649bb1-*.output.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions