Skip to content

tui: per-tick re-render of active streaming message allocates 100s of GB during long sessions #2886

@aheritier

Description

@aheritier

Summary

A 30+ minute streaming session captured with --memprofile shows the TUI allocates 890 GB cumulatively (yes, GB) over its lifetime, of which 76% (677 GB) flows through messages.(*model).ensureAllItemsRendered. Live heap (inuse_space) stayed at a healthy 44 MB, so this isn't a leak — it's an allocation rate / GC pressure problem distinct from #2861 (per-message retention) and #2884 (notification stacking).

Repro

  1. Build with the --memprofile flag plumbed in pkg/profiling.
  2. Run a normal interactive session that includes long-form streaming responses (10s of KB of markdown + reasoning blocks + animated tool calls).
  3. After ~30 min of typical usage, exit cleanly so pprof.WriteHeapProfile runs.
  4. go tool pprof -top -alloc_space heap.pprof.

Profile (anonymized)

File: docker-agent
Type: alloc_space
Showing nodes accounting for 858946.90MB, 96.43% of 890731.07MB total
      flat  flat%   sum%        cum   cum%
425369.83MB 47.76% 47.76% 425369.83MB 47.76%  strings.(*Builder).WriteString (partial-inline)
 81999.61MB  9.21% 56.96%  81999.61MB  9.21%  bytes.growSlice
 55231.66MB  6.20% 63.16%  55231.66MB  6.20%  github.com/charmbracelet/ultraviolet.NewBuffer
 …
 12293.53MB  1.38% 87.37% 454751.76MB 51.05%  pkg/tui/components/message.(*messageModel).render
  4966.89MB  0.56% 93.71% 677505.05MB 76.06%  pkg/tui/components/messages.(*model).ensureAllItemsRendered

-cum view confirms the call chain dominates the program:

                          cum    cum%
ensureAllItemsRendered  677 GB  76.06%
└─ renderItem           673 GB  75.61%
   └─ messageModel.Render 454 GB  51.05%
      └─ lipgloss.Style.Render 566 GB  63.55%
         └─ strings.Builder.WriteString 425 GB  47.76%

For reference the same profile's inuse_space was only 44 MB — nothing is being retained. All those bytes are GC churn.

Mechanism

pkg/tui/components/messages/messages.go, Update:

case animation.TickMsg:
    if m.hasAnimatedContent() {
        m.renderDirty = true
    }

for i, view := range m.views {
    updatedView, cmd := view.Update(msg)
    m.views[i] = updatedView
    if cmd != nil {
        cmds = append(cmds, cmd)
        m.renderDirty = true
    }
}

Any animation tick (spinner, fade, pulse) or any child view emitting a non-nil cmd dirties the entire message list. ensureAllItemsRendered then walks every view and rebuilds renderedLines. The per-item LRU (renderedItems) absorbs most of the cost for finalized messages, but three categories bypass the cache:

  • The currently-streaming assistant message (shouldCacheMessage returns false while Content is empty/whitespace, and even when populated, the streaming target is invalidated on every chunk).
  • Reasoning blocks (MessageTypeAssistantReasoningBlock is hard-coded to return false in shouldCacheMessage because of embedded spinners).
  • The selected/hovered message.

So during streaming, every animation tick (~10–60 Hz depending on platform) re-renders the active message and any reasoning block at full width through markdown → lipgloss → ANSI styling. With long messages and a multi-hour session, total churn easily reaches hundreds of GB allocated.

Why this matters even though inuse_space is small

  • CPU. The 119% CPU I observed during a streaming turn (Activity Monitor) is consistent with one core saturated in markdown/ANSI rendering on top of model decode.
  • Possible jetsam co-contributor. macOS jetsam sums VM-compressor activity into its "largest process" decision. A high allocation rate means many pages cycle through the compressor; combined with tui: per-message render caches leak across long sessions, triggering macOS jetsam kills #2861's per-message retention, this can push docker-agent over the threshold faster than retention alone would. After fix(#2861): release per-message render caches when streaming completes #2866 lands the retention growth stops, but the per-tick re-render storm remains.
  • GC pressure. runtime.mallocgc shows up as the 4th-largest inuse_space consumer (4.6 MB) — small, but a constant fraction of bookkeeping for a lot of churn.

Suggested fix axes

  1. Per-item invalidation. Most m.renderDirty = true sites know which view changed (the for i, view := range m.views loop has the index). Only that view needs re-rendering; the surrounding lines in renderedLines can be patched at lineOffsets[i] rather than rebuilt.

  2. Stream coalescing. During an active stream, the assistant message receives many chunks per second but the user can't perceive >~30 fps. Throttle re-renders of the streaming target to a fixed rate via a tea.Tick-driven flush, independent of chunk arrival.

  3. Reuse buffers across re-renders of the same message. strings.(*Builder).WriteString is 48% of alloc_space — much of it inside lipgloss.Style.Render. A sync.Pool of strings.Builder (or buffer pool wired through IncrementalRenderer) wouldn't change correctness but would dramatically cut allocations. Lipgloss is upstream so this would need a small wrapper.

(1) alone should kill ≥80% of the churn and is the cheapest.

Repro environment

  • macOS 26.x, Apple Silicon, 64 GB
  • docker-agent HEAD as of 2026-05-22
  • Multi-agent config; long streaming responses (~10 KB markdown + reasoning blocks + animated tool calls)
  • Multi-hour interactive session ending in clean exit (so pprof.WriteHeapProfile runs — note that jetsam SIGKILL bypasses this, so capturing requires exiting before the kill).

Related

Metadata

Metadata

Assignees

Labels

area/tuiFor features/issues/fixes related to the TUI

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions