Parent issue: #494
Bug
All 8 Gaia error instances fail with an instant Usage Policy refusal (~2 seconds) from Claude Code. The agent never starts working.
ACP error: Internal error: API Error: Claude Code is unable to respond to this
request, which appears to violate our Usage Policy
(https://www.anthropic.com/legal/aup). Try rephrasing the request or attempting
a different approach.
Details
- Run:
22836395400-claude-son (Gaia)
- Model:
litellm_proxy/claude-sonnet-4-5-20250929
- Agent type:
acp-claude
- Failure pattern: Instant (~2s), deterministic, identical across all 4 retries
Timing evidence (instance 2a649bb1):
03:07:01 - run() triggered successfully: <Response [200 OK]>
03:07:03 - runtime init failure ... error=Conversation run failed
The GAIA prompt contains language like "Failure or 'I cannot answer' will not be tolerated, success will be rewarded" which may trigger Claude Code's safety filters.
Affected Instance IDs
2a649bb1-795f-4a01-b3be-9a01868dae73
983bba7c-c092-455f-b6c9-7857003d48fc
2d83110e-a098-4ebb-9987-066c06fa42d0
ed58682d-bc52-4baa-9eb0-4eb81e1edacc
624cbf11-6a41-4692-af9c-36b3e5ca3130
384d0dd8-e8a4-4cfe-963c-d37f256e7662
8b3379c0-0981-4f5b-8407-6444610cb212
6359a0b1-8f7b-499b-9336-840f9ab90688
Note: Some instances also exhibit the 1800s timeout pattern on earlier retries (e.g., 624cbf11 retry 1 timed out at 30min, while retries 2-4 showed the AUP refusal).
Suggestions
- Adjust the GAIA prompt to avoid triggering safety filters
- Detect AUP refusal errors and skip retries (deterministic failure, retrying is wasted)
- Propagate the specific error type to
output_errors.jsonl instead of generic "Remote conversation ended with error"
Reproduction
gs://openhands-evaluation-results/gaia/litellm_proxy-claude-sonnet-4-5-20250929/22836395400/results.tar.gz
Log: logs/instance_2a649bb1-*.output.log
Parent issue: #494
Bug
All 8 Gaia error instances fail with an instant Usage Policy refusal (~2 seconds) from Claude Code. The agent never starts working.
Details
22836395400-claude-son(Gaia)litellm_proxy/claude-sonnet-4-5-20250929acp-claudeTiming evidence (instance
2a649bb1):The GAIA prompt contains language like "Failure or 'I cannot answer' will not be tolerated, success will be rewarded" which may trigger Claude Code's safety filters.
Affected Instance IDs
2a649bb1-795f-4a01-b3be-9a01868dae73983bba7c-c092-455f-b6c9-7857003d48fc2d83110e-a098-4ebb-9987-066c06fa42d0ed58682d-bc52-4baa-9eb0-4eb81e1edacc624cbf11-6a41-4692-af9c-36b3e5ca3130384d0dd8-e8a4-4cfe-963c-d37f256e76628b3379c0-0981-4f5b-8407-6444610cb2126359a0b1-8f7b-499b-9336-840f9ab90688Note: Some instances also exhibit the 1800s timeout pattern on earlier retries (e.g.,
624cbf11retry 1 timed out at 30min, while retries 2-4 showed the AUP refusal).Suggestions
output_errors.jsonlinstead of generic"Remote conversation ended with error"Reproduction
Log:
logs/instance_2a649bb1-*.output.log