Skip to content

feat: add smoothStream transform for word-by-word text delivery #439

@DiegoGBrisa

Description

@DiegoGBrisa

Problem

Some LLM providers send large text deltas per SSE event instead of token-by-token streaming. Anthropic's thinking-enabled models (e.g. Claude Opus 4.6) batch ~250 characters per content_block_delta event — this is documented server-side behavior, not a client bug. The result is a chunky, jarring streaming UX where large blocks of text appear at once instead of the smooth word-by-word flow users expect.

This affects any consumer of TanStack AI that uses providers with batched deltas.

Precedent

The Vercel AI SDK solved this with smoothStream — a composable TransformStream that buffers incoming text deltas and re-emits them word-by-word with a configurable delay (exposed via experimental_transform option on streamText). It's a ~40-line transform that:

  • Buffers text-delta chunks and extracts words via regex (/\S+\s+/m)
  • Emits each word as a separate text-delta chunk with a 10ms delay
  • Passes non-text chunks through immediately (flushing any buffered text first)
  • Supports multiple chunking modes: "word" (default), "line", RegExp, Intl.Segmenter

Proposal

Add a smoothStream utility and a transform option to TanStack AI's stream processing pipeline, operating on StreamChunk before chunks reach consumers. This would allow any app using useChat or the lower-level stream APIs to opt into smooth text delivery without implementing their own transform.

Suggested API

import { smoothStream } from '@tanstack/ai'

const stream = chat({
  adapter,
  messages,
  transform: smoothStream({ chunking: 'word', delayInMs: 10 }),
})

The transform option accepts a function that wraps the internal StreamChunk iterable, keeping it generic enough for other transforms in the future (e.g. token counting, logging, rate limiting).

smoothStream() with no arguments would use sensible defaults (chunking: 'word', delayInMs: 10).

Scope

  • Default chunking mode: "word" via /\S+\s+/m regex
  • Default delay: 10ms between word emissions
  • Non-text chunks (TOOL_CALL_START, RUN_FINISHED, etc.) flush the buffer and pass through immediately
  • Provider-agnostic — benefits any provider that batches text deltas
  • Optional Intl.Segmenter support for CJK languages (where whitespace-based splitting doesn't work)

Reference

I implemented a local version in OpenWaggle operating on our domain-typed AgentStreamChunk, modeled directly after Vercel's implementation. Happy to contribute a PR adapting it to StreamChunk if this direction is accepted.

Environment

  • @tanstack/ai version: latest
  • Affected providers: Anthropic (Opus 4.6 with thinking), potentially others with batched SSE
  • Not affected: OpenAI, smaller Claude models (Sonnet, Haiku) which send token-sized deltas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions