Skip to content

renderer: Fix the EnablePainting viewport desync behind #20269#20366

Open
yuu61 wants to merge 1 commit into
microsoft:mainfrom
yuu61:fix-renderer-enablepainting-viewport-resync
Open

renderer: Fix the EnablePainting viewport desync behind #20269#20366
yuu61 wants to merge 1 commit into
microsoft:mainfrom
yuu61:fix-renderer-enablepainting-viewport-resync

Conversation

@yuu61

@yuu61 yuu61 commented Jun 26, 2026

Copy link
Copy Markdown

Summary

A fix for the renderer-viewport / engine-viewport desync behind #20269 (AtlasEngine::PaintCursor access violation). It addresses @DHowett's point when closing the clamp in #20363 — that the backing-buffer resize should happen under the same lock as the cursor move rather than being clamped at the read site.

_viewport has exactly two writers. One — _CheckViewportAndScroll() — always drives UpdateViewport(), so it keeps the engine viewport (and thus the row buffer) in sync. The other — EnablePainting() — assigns _viewport directly and does neither, which is the only no-throw way the renderer viewport can outrun the engine's. That is exactly the state a full-heap dump shows at the crash, and the fix makes EnablePainting() force the resync.

Scope, honestly: this closes that specific desync, and the dump confirms the crash is an out-of-range row index (not an in-bounds bad slot). I can't prove from a single dump that it's behind every #20269 report — there's an open frequency question under Notes.

Root cause

coordCursor.y is derived from the renderer's _viewport; the engine's row buffer (AtlasEngine::_p.rows) is sized from _api.s->viewportCellCount, which only UpdateViewport() writes. In a normal paint frame these stay consistent — _CheckViewportAndScroll() sets _viewport and calls UpdateViewport() together, and StartPaint() rebuilds _p.rows to match, all before PaintCursor():

_CheckViewportAndScroll()   // _viewport + UpdateViewport() -> viewportCellCount
_updateCursorInfo()         // coordCursor.y from _viewport
StartPaint()                // rebuild _p.rows to viewportCellCount
...
_PaintCursor()              // _p.rows[coordCursor.y]

The one path that breaks this is Renderer::EnablePainting(). It assigns _viewport = _pData->GetViewport() directly, but — unlike _CheckViewportAndScroll(), the only other writer of _viewport — it neither calls UpdateViewport() nor sets _forceUpdateViewport. If the viewport grew while painting was disabled (render-failure recovery, ResumeRendering, init/relayout), EnablePainting() latches _viewport to the new size. The next _CheckViewportAndScroll() then early-returns (srOldViewport == srNewViewport, both already the new size, and _forceUpdateViewport == false), skipping UpdateViewport() — so the engine viewport and _p.rows stay at the old, smaller size while coordCursor.y comes from the larger _viewport. _PaintCursor still treats the cursor as in-viewport and indexes _p.rows past its end → out-of-bounds read → AV.

Evidence

A full-heap crash dump (stock 1.24.2605.12001) shows the mechanism at the fault:

top (= coordCursor.y) = 67
_p.rows._size         = 65          // 67 >= 65 -> out of bounds
_p.rows[0..64]        = valid ShapedRow*   (every in-bounds slot populated)
_p.rows[67]           = read past the array end -> NULL -> AV at +0xC0 (lineRendition)

Every in-bounds slot is valid, so this is an out-of-range index against a stale row buffer, not an in-bounds NULL/unpopulated slot. And because top=67 passed _PaintCursor's inViewport gate (computed from the renderer _viewport), the renderer viewport was >= 68 rows while the engine's was 65 — a real two-viewport desync. (More detail in #20269.)

Fix

Set _forceUpdateViewport = true in EnablePainting() so the next _CheckViewportAndScroll() runs UpdateViewport() and resizes the backing buffer to match _viewport before the cursor is painted. This mirrors what AddRenderEngine() already does, so it adds no new state-access pattern.

Notes

  • Open question (frequency): EnablePainting() fires on render-failure recovery, ResumeRendering, and init/relayout — relatively infrequent — whereas AtlasEngine::PaintCursor crashes #20269 is reported as frequent under heavy output. So either those triggers coincide with a viewport grow more often than I'd expect (e.g. GPU device-loss under sustained repaint, then a resize during the recovery window), or there's an additional path to the same desync. This fix closes the path I can demonstrate; I'm flagging the frequency angle so it stays on the table rather than claiming it accounts for every report.
  • This supersedes the clamp in AtlasEngine: Fix out-of-bounds row access in PaintCursor (#20269) #20363 (closed). That clamp is a reasonable crash-site backstop — it's the same idiom every other Paint*() already uses — but it masks the desync rather than fixing it. Happy to add it back as defense-in-depth if you'd like both.
  • There's a separate, orthogonal exception-safety issue (_handleSettingsUpdate advances _p.s before rebuilding _p.rows, so a throw mid-rebuild can leave an in-bounds garbage slot). It isn't exercised by this dump; I can send it as its own PR.
  • Compiles clean (renderer base lib, x64).

EnablePainting() assigns _viewport = _pData->GetViewport() directly, but
unlike _CheckViewportAndScroll() - the only other writer of _viewport - it
neither calls UpdateViewport() nor sets _forceUpdateViewport. UpdateViewport()
is the sole writer of _api.s->viewportCellCount, which sizes the engine's row
buffer (AtlasEngine's _p.rows).

If the viewport grows while painting is disabled (e.g. the render-failure
recovery path, ResumeRendering, or init/relayout), EnablePainting() latches
_viewport to the new size. The next _CheckViewportAndScroll() then early-returns
because srOldViewport == srNewViewport (both already the new size) and
_forceUpdateViewport is false, so UpdateViewport() is skipped and the engine
viewport - and _p.rows - stay at the old, smaller size. _updateCursorInfo()
derives coordCursor.y from the larger _viewport and still reports the cursor as
in-viewport, so PaintCursor() indexes _p.rows past its end -> out-of-bounds read
and an access violation (GH#20269).

A full-heap crash dump confirms the mechanism: at the fault, coordCursor.y = 67
while _p.rows holds 65 fully-valid entries (every in-bounds slot points into
_p.unorderedRows), and _p.rows[67] reads past the array into adjacent heap.

Set _forceUpdateViewport = true in EnablePainting() so the next
_CheckViewportAndScroll() runs UpdateViewport() and resizes the backing buffer to
match _viewport before the cursor is painted - keeping the resize on the same
render-thread, lock-held path as the cursor move rather than masking the symptom
at the read site.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant