Skip to content

Commit 0ade07f

Browse files
janiszclaude
andcommitted
docs: analyze mcpchecker v0.0.14 agent output bug
Investigated E2E test failures after upgrading to mcpchecker v0.0.14. Found that some tests fail because the OpenAI mock agent makes tool calls but doesn't send a final AgentMessageChunk update, causing llmJudge to fail with "cannot run llmJudge step before agent". Created reproduction test that demonstrates the issue: - Agent makes tool call and gets result (ToolCall + ToolCallUpdate) - But no AgentMessageChunk is sent afterward - ExtractOutputSteps produces only "tool_call" type steps - FinalMessageFromSteps returns empty string - llmJudge validation fails on empty Agent.Output The root cause appears to be in llmagent/acp_agent.go where the OnStepFinish callback may not be called in all scenarios, or step.Response.Content.Text() returns empty after tool calls. This may be related to the fantasy library update (v0.16.0 → v0.17.1) in mcpchecker v0.0.13. Test added to mcpchecker's pkg/agent/extract_test.go at: /tmp/mcpchecker/pkg/agent/extract_test.go Run with: cd /tmp/mcpchecker && go test -v -run TestAgentWithOnlyToolCallsNoFinalMessage ./pkg/agent/ See docs/mcpchecker-v0.0.14-bug-analysis.md for full analysis. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 53a518e commit 0ade07f

File tree

1 file changed

+163
-0
lines changed

1 file changed

+163
-0
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# mcpchecker v0.0.14 Bug Analysis
2+
3+
## Summary
4+
5+
E2E tests are failing after upgrading from mcpchecker v0.0.12 to v0.0.14 due to an issue where the OpenAI mock agent makes tool calls but doesn't send a final AgentMessageChunk update. This causes llmJudge verification to fail with:
6+
7+
```
8+
cannot run llmJudge step before agent (must be in verification)
9+
```
10+
11+
## Affected Tests
12+
13+
- `cve-cluster-does-exist` - ❌ Makes 1 tool call, no final message
14+
- `cve-cluster-does-not-exist` - ❌ Makes 1 tool call, no final message
15+
- `cve-log4shell` - ❌ Makes 3 tool calls (including failing node call), no final message
16+
- `cve-nonexistent` - ⚠️ Makes 4 tool calls, has final message but judge fails for other reasons
17+
18+
Working tests like `cve-detected-clusters`, `cve-multiple`, etc. all have a final message step.
19+
20+
## Root Cause
21+
22+
### Normal Flow (Working)
23+
1. Agent calls tool via ACP ToolCall update
24+
2. Tool executes and returns result via ToolCallUpdate
25+
3. **OpenAI mock sends follow-up chat completion with "Evaluation complete."**
26+
4. llmagent converts response to AgentMessageChunk via OnStepFinish callback
27+
5. ExtractOutputSteps produces steps including final "message" type
28+
6. llmJudge can evaluate the final message
29+
30+
### Broken Flow (Failing Tests)
31+
1. Agent calls tool via ACP ToolCall update
32+
2. Tool executes and returns result via ToolCallUpdate
33+
3. **No follow-up message is sent** (or fantasy doesn't call OnStepFinish)
34+
4. ExtractOutputSteps produces only "tool_call" type steps
35+
5. FinalMessageFromSteps returns empty string
36+
6. llmJudge fails because `input.Agent.Output == ""`
37+
38+
## Technical Details
39+
40+
### OpenAI Mock Server Behavior
41+
42+
The mock in `functional/servers/openai/server.go` is supposed to send a follow-up message:
43+
44+
```go
45+
// If request contains tool result messages, this is a follow-up after a tool call.
46+
// Return a simple text response to end the agentic loop.
47+
for _, msg := range req.Messages {
48+
if msg.Role == "tool" {
49+
followUp := &ChatCompletionResponse{
50+
// ...
51+
Message: Message{
52+
Role: "assistant",
53+
Content: "Evaluation complete.",
54+
},
55+
FinishReason: "stop",
56+
}
57+
// ...
58+
}
59+
}
60+
```
61+
62+
### llmagent ACP Agent
63+
64+
The `acp_agent.go` processes OpenAI responses via fantasy's OnStepFinish:
65+
66+
```go
67+
OnStepFinish: func(step fantasy.StepResult) error {
68+
text := step.Response.Content.Text()
69+
if text == "" {
70+
return nil // ← Early return if no text!
71+
}
72+
73+
return a.conn.SessionUpdate(promptCtx, acp.SessionNotification{
74+
SessionId: params.SessionId,
75+
Update: acp.UpdateAgentMessageText(text),
76+
})
77+
},
78+
```
79+
80+
If `step.Response.Content.Text()` is empty, no AgentMessageText update is sent.
81+
82+
### llmJudge Validation
83+
84+
The llmJudge step validates agent output in `pkg/steps/llm_judge.go:88-90`:
85+
86+
```go
87+
if input.Agent == nil || input.Agent.Prompt == "" || input.Agent.Output == "" {
88+
return nil, fmt.Errorf("cannot run llmJudge step before agent (must be in verification)")
89+
}
90+
```
91+
92+
## Reproduction Test
93+
94+
Added test in mcpchecker repo:
95+
96+
```go
97+
// TestAgentWithOnlyToolCallsNoFinalMessage reproduces issue #268
98+
func TestAgentWithOnlyToolCallsNoFinalMessage(t *testing.T) {
99+
updates := []acp.SessionUpdate{
100+
{
101+
ToolCall: &acp.SessionUpdateToolCall{
102+
ToolCallId: "call-1",
103+
Title: "get_clusters_with_orchestrator_cve",
104+
// ...
105+
},
106+
},
107+
{
108+
ToolCallUpdate: &acp.SessionToolCallUpdate{
109+
ToolCallId: "call-1",
110+
Status: ptr(acp.ToolCallStatusCompleted),
111+
// ...
112+
},
113+
},
114+
// BUG: No AgentMessageChunk update here!
115+
}
116+
117+
steps := agent.ExtractOutputSteps(updates)
118+
assert.Len(t, steps, 1, "Only has tool_call, no message")
119+
120+
finalMessage := agent.FinalMessageFromSteps(steps)
121+
assert.Empty(t, finalMessage) // ← Fails llmJudge validation
122+
}
123+
```
124+
125+
To run: `cd /tmp/mcpchecker && go test -v -run TestAgentWithOnlyToolCallsNoFinalMessage ./pkg/agent/`
126+
127+
## Hypothesis
128+
129+
The issue may be related to:
130+
131+
1. **fantasy library behavior change** - The charm.land/fantasy package was updated in v0.0.13 (bump from 0.16.0 to 0.17.1). The OnStepFinish callback might not be called in all scenarios.
132+
133+
2. **OpenAI streaming response handling** - The mock server's streaming implementation might not be properly triggering OnStepFinish for the follow-up message after tool results.
134+
135+
3. **ACP protocol handling** - The conversion from OpenAI chat completion responses to ACP SessionUpdate messages might have edge cases.
136+
137+
## Investigation Steps
138+
139+
1. ✅ Reproduced issue with unit test
140+
2. ✅ Identified that FinalMessageFromSteps returns empty for failing tests
141+
3. ✅ Traced to missing AgentMessageChunk updates in SessionUpdate stream
142+
4. ⏭️ **TODO**: Check if fantasy v0.17.1 has breaking changes in OnStepFinish behavior
143+
5. ⏭️ **TODO**: Add debug logging to llmagent acp_agent.go to see if OnStepFinish is called
144+
6. ⏭️ **TODO**: Test with real OpenAI API instead of mock to confirm it's a mock issue
145+
7. ⏭️ **TODO**: Review PR #268 discussion on mcpchecker repo for context
146+
147+
## Workaround
148+
149+
For now, we've:
150+
1. Migrated all tasks to v1alpha2 format (required by v0.0.14)
151+
2. Fixed wiremock ExportNodeResponse fixture format
152+
3. Waiting to see if these fixes resolve the remaining failures
153+
154+
If failures persist, we may need to:
155+
- Downgrade to mcpchecker v0.0.12 temporarily
156+
- Report bug upstream to mcpchecker with reproduction test
157+
- Investigate fantasy library update as potential cause
158+
159+
## Related Links
160+
161+
- mcpchecker PR #268: https://github.com/mcpchecker/mcpchecker/pull/268
162+
- Our PR #102: https://github.com/stackrox/stackrox-mcp/pull/102
163+
- Test run: https://github.com/stackrox/stackrox-mcp/actions/runs/23899405760

0 commit comments

Comments
 (0)