Skip to content

feat(agents): enable conservative retries by default for transient errors#449

Open
MarioCadenas wants to merge 1 commit into
mainfrom
feat/agent-default-retries
Open

feat(agents): enable conservative retries by default for transient errors#449
MarioCadenas wants to merge 1 commit into
mainfrom
feat/agent-default-retries

Conversation

@MarioCadenas

Copy link
Copy Markdown
Collaborator

What

Agents previously disabled retries entirely — agentStreamDefaults set retry: { enabled: false }. A single transient serving error (5xx, 429, connection reset) would fail the whole turn.

This enables a conservative default that reuses the existing RetryInterceptor (exponential backoff with full jitter, only retries transient/retryable errors):

retry: { enabled: true, attempts: 2, initialDelay: 500, maxDelay: 4_000 }
  • Old default: retries off — any transient blip failed the turn.
  • New default: at most one extra attempt, 500ms→4s jittered backoff. Non-retryable errors (4xx, AppKitError with isRetryable=false) are still never retried — see isRetryableError in interceptors/retry.ts.

No new retry logic was added; only the default config value changed.

Streaming / non-idempotency safety (the scope applied)

The agents plugin consumes retry in exactly one place: the streaming /chat path via executeStream (agents.ts:1162). This change is streaming-replay-safe by construction, not by configuration:

  • In executeStream, the interceptor chain wraps wrappedFn = async () => fn(signal), where fn is the async generator function for the turn.
  • Calling a generator function returns the generator object synchronously without running its body. So wrappedFn resolves immediately and the RetryInterceptor sees success on attempt 1.
  • Token emission and tool dispatch happen later, during yield* iteration — which runs outside the interceptor chain.
  • Therefore a transient error thrown after the first streamed event surfaces during iteration and is never seen by the retry loop. There is no path by which retry can re-emit tokens or re-run a tool side-effect (no double-charge / duplicate writes).
  • The only thing retried is a transient failure during generator setup, before any output — which is safe and idempotent.

The non-streaming /invocations and /responses path (_runAgentNonStreaming) does not route through execute()/executeStream() at all, so this default does not affect it.

Tests

Added packages/appkit/src/plugins/agents/tests/defaults.test.ts:

  • asserts the conservative default values (enabled, attempts === 2, bounded backoff caps).
  • proves no mid-stream replay: drives the real RetryInterceptor exactly as executeStream does, throws a 5xx after the first yielded token, and asserts the generator body ran once and the tool side-effect fired once (no replay).
  • proves a transient error during generator setup (before output) is retried.

Verification

  • pnpm --filter=@databricks/appkit typecheck — clean.
  • agents suite: 120 passed; retry.test.ts: 19 passed; new defaults.test.ts: 3 passed.
  • biome check on changed files — no fixes/errors.

This pull request and its description were written by Isaac.

…ent errors

Agent stream defaults previously set retry { enabled: false }. Enable a
conservative default (attempts: 2, 500ms..4s backoff) so a transient serving
error (5xx / 429 / connection reset) doesn't fail the whole turn. Non-retryable
errors (4xx, AppKitError isRetryable=false) are still never retried.

Streaming safety: in executeStream the RetryInterceptor wraps only the
synchronous creation of the adapter async generator; token emission and tool
dispatch run during yield* iteration, outside the interceptor chain. A transient
error thrown after the first streamed event therefore cannot be retried, so
retries can never re-emit tokens or re-run a tool side-effect. Only a failure
during generator setup (before any output) is retried. Added defaults.test.ts
asserting the conservative values and proving no mid-stream replay.

Co-authored-by: Isaac
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
@MarioCadenas MarioCadenas requested a review from a team as a code owner June 13, 2026 19:26
@MarioCadenas MarioCadenas requested a review from pkosiec June 13, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant