renderer: Fix the EnablePainting viewport desync behind #20269#20366
Open
yuu61 wants to merge 1 commit into
Open
renderer: Fix the EnablePainting viewport desync behind #20269#20366yuu61 wants to merge 1 commit into
yuu61 wants to merge 1 commit into
Conversation
EnablePainting() assigns _viewport = _pData->GetViewport() directly, but unlike _CheckViewportAndScroll() - the only other writer of _viewport - it neither calls UpdateViewport() nor sets _forceUpdateViewport. UpdateViewport() is the sole writer of _api.s->viewportCellCount, which sizes the engine's row buffer (AtlasEngine's _p.rows). If the viewport grows while painting is disabled (e.g. the render-failure recovery path, ResumeRendering, or init/relayout), EnablePainting() latches _viewport to the new size. The next _CheckViewportAndScroll() then early-returns because srOldViewport == srNewViewport (both already the new size) and _forceUpdateViewport is false, so UpdateViewport() is skipped and the engine viewport - and _p.rows - stay at the old, smaller size. _updateCursorInfo() derives coordCursor.y from the larger _viewport and still reports the cursor as in-viewport, so PaintCursor() indexes _p.rows past its end -> out-of-bounds read and an access violation (GH#20269). A full-heap crash dump confirms the mechanism: at the fault, coordCursor.y = 67 while _p.rows holds 65 fully-valid entries (every in-bounds slot points into _p.unorderedRows), and _p.rows[67] reads past the array into adjacent heap. Set _forceUpdateViewport = true in EnablePainting() so the next _CheckViewportAndScroll() runs UpdateViewport() and resizes the backing buffer to match _viewport before the cursor is painted - keeping the resize on the same render-thread, lock-held path as the cursor move rather than masking the symptom at the read site. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A fix for the renderer-viewport / engine-viewport desync behind #20269 (
AtlasEngine::PaintCursoraccess violation). It addresses @DHowett's point when closing the clamp in #20363 — that the backing-buffer resize should happen under the same lock as the cursor move rather than being clamped at the read site._viewporthas exactly two writers. One —_CheckViewportAndScroll()— always drivesUpdateViewport(), so it keeps the engine viewport (and thus the row buffer) in sync. The other —EnablePainting()— assigns_viewportdirectly and does neither, which is the only no-throw way the renderer viewport can outrun the engine's. That is exactly the state a full-heap dump shows at the crash, and the fix makesEnablePainting()force the resync.Scope, honestly: this closes that specific desync, and the dump confirms the crash is an out-of-range row index (not an in-bounds bad slot). I can't prove from a single dump that it's behind every #20269 report — there's an open frequency question under Notes.
Root cause
coordCursor.yis derived from the renderer's_viewport; the engine's row buffer (AtlasEngine::_p.rows) is sized from_api.s->viewportCellCount, which onlyUpdateViewport()writes. In a normal paint frame these stay consistent —_CheckViewportAndScroll()sets_viewportand callsUpdateViewport()together, andStartPaint()rebuilds_p.rowsto match, all beforePaintCursor():The one path that breaks this is
Renderer::EnablePainting(). It assigns_viewport = _pData->GetViewport()directly, but — unlike_CheckViewportAndScroll(), the only other writer of_viewport— it neither callsUpdateViewport()nor sets_forceUpdateViewport. If the viewport grew while painting was disabled (render-failure recovery,ResumeRendering, init/relayout),EnablePainting()latches_viewportto the new size. The next_CheckViewportAndScroll()then early-returns (srOldViewport == srNewViewport, both already the new size, and_forceUpdateViewport == false), skippingUpdateViewport()— so the engine viewport and_p.rowsstay at the old, smaller size whilecoordCursor.ycomes from the larger_viewport._PaintCursorstill treats the cursor as in-viewport and indexes_p.rowspast its end → out-of-bounds read → AV.Evidence
A full-heap crash dump (stock 1.24.2605.12001) shows the mechanism at the fault:
Every in-bounds slot is valid, so this is an out-of-range index against a stale row buffer, not an in-bounds NULL/unpopulated slot. And because
top=67passed_PaintCursor'sinViewportgate (computed from the renderer_viewport), the renderer viewport was>= 68rows while the engine's was65— a real two-viewport desync. (More detail in #20269.)Fix
Set
_forceUpdateViewport = trueinEnablePainting()so the next_CheckViewportAndScroll()runsUpdateViewport()and resizes the backing buffer to match_viewportbefore the cursor is painted. This mirrors whatAddRenderEngine()already does, so it adds no new state-access pattern.Notes
EnablePainting()fires on render-failure recovery,ResumeRendering, and init/relayout — relatively infrequent — whereas AtlasEngine::PaintCursor crashes #20269 is reported as frequent under heavy output. So either those triggers coincide with a viewport grow more often than I'd expect (e.g. GPU device-loss under sustained repaint, then a resize during the recovery window), or there's an additional path to the same desync. This fix closes the path I can demonstrate; I'm flagging the frequency angle so it stays on the table rather than claiming it accounts for every report.Paint*()already uses — but it masks the desync rather than fixing it. Happy to add it back as defense-in-depth if you'd like both._handleSettingsUpdateadvances_p.sbefore rebuilding_p.rows, so a throw mid-rebuild can leave an in-bounds garbage slot). It isn't exercised by this dump; I can send it as its own PR.