Summary
When docker agent serve chat is used with X-Conversation-Id, continuation requests mutate the cached *session.Session before execution. If the run later fails (timeout, 500, stream error, etc.), the mutated session remains the canonical conversation state, so a retry can see duplicated or half-failed history.
Why this looks wrong
On current main (0aa4436d), the continuation path is:
pkg/chatserver/server.go: handleChatCompletions calls resolveSession(...)
pkg/chatserver/conversations.go: Get(...) returns the live cached *session.Session
pkg/chatserver/server.go: resolveSession(...) reuses that pointer and calls appendLatestUser(existing, msgs)
pkg/chatserver/agent.go: appendLatestUser(...) mutates the session in place before execution
pkg/chatserver/server.go: after the run returns, maybeStoreConversation(...) is still called
That means the conversation has already advanced before the request outcome is known, and there is no rollback path if the run fails.
History context
This appears to date back to the feature introduction:
d0c9985f introduced X-Conversation-Id and the in-memory conversation cache
2ee80aff later changed the logic to always store the conversation after request completion so evicted conversations are restored
047167a4 then simplified away the old isNew flag
The eviction fix makes sense on its own, but it also means failed turns are reinserted after eviction as well. More importantly, even without Put(...), the cached session pointer has already been mutated in place.
Repro shape
- Start
serve chat with --conversations-max enabled.
- Send a request with
X-Conversation-Id: conv-1 and a valid user message.
- Send a second request with the same conversation id and a new user message, but force the run to fail after the message has been appended to the session (for example via provider timeout or an injected runtime error).
- Retry that second request with the same conversation id.
- The retry now runs against a session that already contains the failed turn, so history is no longer equivalent to "retry the same request against the last successful conversation state".
Depending on where the failure happens, this can show up as duplicated user turns or other half-failed conversation state surviving into the next attempt.
Expected behavior
A failed continuation request should not advance the cached conversation state.
Possible fixes
A couple of directions that seem reasonable:
- Clone/snapshot the cached session before
appendLatestUser(...), and only commit it back to the cache on success.
- Or mutate a working copy during the request and leave the cached session untouched until the run succeeds.
- If streaming failures are intended to preserve some state, that likely needs an explicit policy rather than the current in-place mutation behavior.
Tests that seem worth adding
I could not find coverage for either of these cases:
- failed continuation requests do not advance cached history
- continuation requests with no new user message are rejected instead of reusing the prior session state
There is coverage for eviction restore, appendLatestUser, and same-conversation locking, but not for transactional behavior on failure.
Summary
When
docker agent serve chatis used withX-Conversation-Id, continuation requests mutate the cached*session.Sessionbefore execution. If the run later fails (timeout, 500, stream error, etc.), the mutated session remains the canonical conversation state, so a retry can see duplicated or half-failed history.Why this looks wrong
On current
main(0aa4436d), the continuation path is:pkg/chatserver/server.go:handleChatCompletionscallsresolveSession(...)pkg/chatserver/conversations.go:Get(...)returns the live cached*session.Sessionpkg/chatserver/server.go:resolveSession(...)reuses that pointer and callsappendLatestUser(existing, msgs)pkg/chatserver/agent.go:appendLatestUser(...)mutates the session in place before executionpkg/chatserver/server.go: after the run returns,maybeStoreConversation(...)is still calledThat means the conversation has already advanced before the request outcome is known, and there is no rollback path if the run fails.
History context
This appears to date back to the feature introduction:
d0c9985fintroducedX-Conversation-Idand the in-memory conversation cache2ee80afflater changed the logic to always store the conversation after request completion so evicted conversations are restored047167a4then simplified away the oldisNewflagThe eviction fix makes sense on its own, but it also means failed turns are reinserted after eviction as well. More importantly, even without
Put(...), the cached session pointer has already been mutated in place.Repro shape
serve chatwith--conversations-maxenabled.X-Conversation-Id: conv-1and a valid user message.Depending on where the failure happens, this can show up as duplicated user turns or other half-failed conversation state surviving into the next attempt.
Expected behavior
A failed continuation request should not advance the cached conversation state.
Possible fixes
A couple of directions that seem reasonable:
appendLatestUser(...), and only commit it back to the cache on success.Tests that seem worth adding
I could not find coverage for either of these cases:
There is coverage for eviction restore,
appendLatestUser, and same-conversation locking, but not for transactional behavior on failure.