Skip to content

Size streaming GeoTIFF write budget from source chunks, not output tiles#3008

Open
brendancol wants to merge 2 commits into
mainfrom
issue-3007
Open

Size streaming GeoTIFF write budget from source chunks, not output tiles#3008
brendancol wants to merge 2 commits into
mainfrom
issue-3007

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #3007

The streaming to_geotiff writer bounds peak memory with streaming_buffer_bytes, but sized its horizontal-segment budget from the output tile geometry (tile_h * tile_w). That undercounts a map_overlap source: slicing one tile-row out of open_geotiff(chunks=...).xrs.slope() materializes every source chunk-row the band touches plus a one-chunk halo. A source chunked taller than the tile (e.g. chunks=512, tile_size=256) on a wide raster pulled several full-width source chunk-rows per compute and OOMed, despite the buffer.

  • Add _max_streaming_row_span: the worst-case source rows a tile-band materializes (touched source chunk-rows plus a one-chunk halo each side; map_overlap depth never exceeds one chunk, so that is an upper bound).
  • Size bytes_per_tile_col from that span instead of th. Falls back to the tile height for numpy-from-dask or unknown-chunk arrays. The column halo of a 2D overlap adds a bounded couple of source chunk-columns, so streaming_buffer_bytes stays a soft cap.

On the repro (2048x8192 float32, chunks=512, slope, 8 MB cap) peak source bytes per compute drop from 32 MB to 12.6 MB.

Backend coverage: writer path is backend-agnostic (numpy / cupy / dask+numpy / dask+cupy all reach the same streaming writer); the fix affects the dask streaming branch.

Test plan:

  • test_overlap_source_respects_buffer_3007 -- instruments source-chunk materialization and asserts peak per compute stays within 2x the buffer (fails pre-fix at 4x)
  • test_overlap_source_roundtrip_small_buffer_3007 -- lazy slope under a tight cap matches eager slope
  • test_max_streaming_row_span_3007 -- helper unit test
  • existing TestStreamingBufferBudget + dask integration suite still green (146 passed, 1 skipped)

…3007)

The streaming writer's horizontal-segment budget was computed from the
output tile geometry (tile_h * tile_w). For a map_overlap source such as
open_geotiff(chunks=...).xrs.slope() that undercounts: slicing one
tile-row pulls every source chunk-row the band touches plus a one-chunk
halo. A source chunked taller than the tile on a wide raster therefore
pulled several full-width source chunk-rows per compute and OOMed,
despite streaming_buffer_bytes.

Size the budget from the source chunk-row span (touched chunk-rows + a
one-chunk halo each side) so peak per-compute memory tracks the soft
cap. Falls back to the tile height for numpy-from-dask / unknown-chunk
arrays. On the repro (2048x8192 f32, chunks=512, slope, 8 MB cap) peak
source bytes per compute drop from 32 MB to 12.6 MB.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 7, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: source-aware streaming write budget

Reviewed the full diff and the surrounding streaming-writer context. The change is narrow and the logic holds up.

Blockers

None.

Suggestions

  • The one-chunk halo is added unconditionally, so the budget treats every source as if it had a map_overlap halo. A genuinely non-overlap wide read (plain windowed read, no slope/aspect) will therefore segment about one step earlier than it strictly needs to. The writer can't cheaply tell overlap from non-overlap, and over-segmenting only costs an extra compute call near the buffer ceiling -- the safe direction. Worth a one-line note in the code comment that the non-overlap case is intentionally conservative, but not worth detection machinery.

Nits

  • _writer.py: import bisect sits inside the helper rather than at module top. That matches the file's existing style (os / tempfile are imported inside _write_streaming), so it's consistent -- leaving as-is is fine.
  • Strip layout (tiled=False) still has no horizontal segmentation, so an overlap source in strip mode keeps its old unbounded peak. The issue scopes that out and the docstring now says so; flagging only so it isn't forgotten.

What looks good

  • The budget is sized from the real driver. _max_streaming_row_span accounts for the source chunk-rows a tile-band touches plus the one-chunk halo, and the one-chunk bound is correct because map_overlap requires depth <= chunk size.
  • The fix only resizes column segments; the slices stay tile-aligned and cover the full raster, so output bytes are unchanged. The 146-test streaming + dask-integration suite staying green backs that up.
  • test_overlap_source_respects_buffer_3007 instruments actual source-chunk materialization rather than asserting on an internal counter, and it fails on the pre-fix writer (32 MB vs 8 MB cap). That's a real regression guard, not a smoke test.
  • Helper has direct unit coverage for the three interesting shapes (band inside a tall chunk, middle band with halo on both sides, single full-height chunk).
  • Unknown-chunk / numpy-from-dask inputs fall back to the tile height via the getattr + try/except, so the new path can't raise on a chunkless array.

Checklist

  • Output unchanged (slices still tile-aligned and complete)
  • All streaming backends reach the same path; fix is backend-agnostic
  • NaN handling untouched (no change to pixel transforms)
  • Edge cases covered (tall chunk, ragged halo, single chunk, fallback)
  • Dask chunk boundaries handled (halo bound is provably tight)
  • No premature materialization added (helper is pure arithmetic)
  • Benchmark not needed (memory-budget fix, no hot loop touched)
  • README matrix N/A (no new function)
  • Docstrings updated in both _write_streaming and public to_geotiff

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after aeec97f)

Disposition of the first-pass findings:

  • Fixed -- the suggestion to flag the non-overlap conservatism in-code. Added a comment at the budget block (aeec97fc) explaining that a non-overlap read carries no halo, so the one-chunk halo is intentionally conservative there and may segment one step early.
  • Dismissed (style match) -- import bisect inside the helper. The file already imports os / tempfile inside _write_streaming, so the local import is consistent with the surrounding code.
  • Dismissed (out of scope, documented) -- strip-layout (tiled=False) overlap peak. The issue scopes strip mode out and the docstring now states it does no horizontal segmentation.

No new findings. The one-line follow-up is comment-only; the streaming budget + helper tests stay green (7 passed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming GeoTIFF writer budgets memory from output tiles, not source chunks — OOMs on slope/overlap pipelines

1 participant