Problem
Some LLM providers send large text deltas per SSE event instead of token-by-token streaming. Anthropic's thinking-enabled models (e.g. Claude Opus 4.6) batch ~250 characters per content_block_delta event — this is documented server-side behavior, not a client bug. The result is a chunky, jarring streaming UX where large blocks of text appear at once instead of the smooth word-by-word flow users expect.
This affects any consumer of TanStack AI that uses providers with batched deltas.
Precedent
The Vercel AI SDK solved this with smoothStream — a composable TransformStream that buffers incoming text deltas and re-emits them word-by-word with a configurable delay (exposed via experimental_transform option on streamText). It's a ~40-line transform that:
- Buffers
text-delta chunks and extracts words via regex (/\S+\s+/m)
- Emits each word as a separate
text-delta chunk with a 10ms delay
- Passes non-text chunks through immediately (flushing any buffered text first)
- Supports multiple chunking modes:
"word" (default), "line", RegExp, Intl.Segmenter
Proposal
Add a smoothStream utility and a transform option to TanStack AI's stream processing pipeline, operating on StreamChunk before chunks reach consumers. This would allow any app using useChat or the lower-level stream APIs to opt into smooth text delivery without implementing their own transform.
Suggested API
import { smoothStream } from '@tanstack/ai'
const stream = chat({
adapter,
messages,
transform: smoothStream({ chunking: 'word', delayInMs: 10 }),
})
The transform option accepts a function that wraps the internal StreamChunk iterable, keeping it generic enough for other transforms in the future (e.g. token counting, logging, rate limiting).
smoothStream() with no arguments would use sensible defaults (chunking: 'word', delayInMs: 10).
Scope
- Default chunking mode:
"word" via /\S+\s+/m regex
- Default delay: 10ms between word emissions
- Non-text chunks (
TOOL_CALL_START, RUN_FINISHED, etc.) flush the buffer and pass through immediately
- Provider-agnostic — benefits any provider that batches text deltas
- Optional
Intl.Segmenter support for CJK languages (where whitespace-based splitting doesn't work)
Reference
I implemented a local version in OpenWaggle operating on our domain-typed AgentStreamChunk, modeled directly after Vercel's implementation. Happy to contribute a PR adapting it to StreamChunk if this direction is accepted.
Environment
@tanstack/ai version: latest
- Affected providers: Anthropic (Opus 4.6 with thinking), potentially others with batched SSE
- Not affected: OpenAI, smaller Claude models (Sonnet, Haiku) which send token-sized deltas
Problem
Some LLM providers send large text deltas per SSE event instead of token-by-token streaming. Anthropic's thinking-enabled models (e.g. Claude Opus 4.6) batch ~250 characters per
content_block_deltaevent — this is documented server-side behavior, not a client bug. The result is a chunky, jarring streaming UX where large blocks of text appear at once instead of the smooth word-by-word flow users expect.This affects any consumer of TanStack AI that uses providers with batched deltas.
Precedent
The Vercel AI SDK solved this with
smoothStream— a composableTransformStreamthat buffers incoming text deltas and re-emits them word-by-word with a configurable delay (exposed viaexperimental_transformoption onstreamText). It's a ~40-line transform that:text-deltachunks and extracts words via regex (/\S+\s+/m)text-deltachunk with a 10ms delay"word"(default),"line",RegExp,Intl.SegmenterProposal
Add a
smoothStreamutility and atransformoption to TanStack AI's stream processing pipeline, operating onStreamChunkbefore chunks reach consumers. This would allow any app usinguseChator the lower-level stream APIs to opt into smooth text delivery without implementing their own transform.Suggested API
The
transformoption accepts a function that wraps the internalStreamChunkiterable, keeping it generic enough for other transforms in the future (e.g. token counting, logging, rate limiting).smoothStream()with no arguments would use sensible defaults (chunking: 'word',delayInMs: 10).Scope
"word"via/\S+\s+/mregexTOOL_CALL_START,RUN_FINISHED, etc.) flush the buffer and pass through immediatelyIntl.Segmentersupport for CJK languages (where whitespace-based splitting doesn't work)Reference
I implemented a local version in OpenWaggle operating on our domain-typed
AgentStreamChunk, modeled directly after Vercel's implementation. Happy to contribute a PR adapting it toStreamChunkif this direction is accepted.Environment
@tanstack/aiversion: latest