Skip to content

Fast cell rendering: render_shapes/labels(as_points=True) + squidpy centroid caching#703

Open
timtreis wants to merge 7 commits into
mainfrom
feat/centroid-scatter-helper
Open

Fast cell rendering: render_shapes/labels(as_points=True) + squidpy centroid caching#703
timtreis wants to merge 7 commits into
mainfrom
feat/centroid-scatter-helper

Conversation

@timtreis

@timtreis timtreis commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

Adds a fast rendering mode that draws each cell as a dot at its centroid instead of its full geometry / rasterized mask — a large speedup when only cell location matters (the squidpy spatial_scatter idea, integrated into spatialdata-plot's API).

sdata.pl.render_shapes("cells", color="cell_type", as_points=True, size=20).pl.show()
sdata.pl.render_labels("cells", color="leiden", as_points=True).pl.show()

What's in this PR

1. Centroid extraction + squidpy obsm["spatial"] caching (utils.py)

  • Coordinates stored intrinsic (element-native) and transformed to the render coordinate system on demand → one cache serves every coordinate system (proven equivalent to per-CS; locked by a test). The transform is a numpy affine matmul (no dask round-trip on the hot path).
  • _compute_element_centroids: shapes → shapely .centroid; 2D labels → regionprops (≈750–1860× faster than get_centroids on rasters); other types → NotImplementedError.
  • _get_or_compute_centroids: reuses a pre-existing obsm["spatial"] (loader/user-provided, trusted as intrinsic) or computes + writes it back with {n, scale_level} provenance in uns; invalidated on instance-count change. Never clobbers an incompatible array.

2. as_points=True rendering (render.py, basic.py, render_params.py)

  • New as_points: bool + size: on render_shapes and render_labels.
  • Shared _render_centroids_as_points (built on the extracted _scatter_points) draws the dots + legend/colorbar. The per-cell color vector is the same one the geometry/raster path computes, so colors match the full rendering exactly — only the apply step differs (scatter vs patches/imshow).
  • Shapes: centroids from the filtered geometry (aligned to the color vector); labels: from _get_or_compute_centroids reindexed to instance_id. Positions verified identical to sd.get_centroids.
  • outline_*/shape (shapes) and contour_px/outline_* (labels) are ignored with an info log.

Verification

  • Centroid positions land exactly on get_centroids for shapes and labels.
  • Default (as_points=False) output is byte-identical to main.
  • Phase 0 _scatter_points extraction byte-identical; coordinate-system-independence, cache round-trip/provenance, staleness, unsupported-type rejection all unit-tested.
  • pre-commit ruff + mypy clean.

Known follow-ups (deliberately not in this PR — flagged for holistic review)

  • datashader backend for as_points at very large cell counts (currently always matplotlib scatter; method= is not yet honored for the dot backend). Needs a _datashader_points extraction mirroring _scatter_points.
  • Cache persistence: show() operates on self._copy(), so the obsm["spatial"] write doesn't reach the user's original object across separate .show() calls. Making the "magic cache" persist needs writing to the source object — a show()-architecture decision.
  • Out-of-core (chunked bincount) labels centroids for 100k²-scale masks.
  • Visual test_plot_* baselines for as_points (functional/position tests included here; baselines to be generated from CI).

Design + the 12 locked decisions: plans/fast-render-cells.md.

timtreis added 5 commits June 8, 2026 14:36
… helper

Infrastructure for an upcoming "render cells as centroid points" fast mode
(no user-facing render option yet).

Phase 0 — shared scatter primitive:
- Extract `_scatter_points(ax, x, y, color_vector, ...)` from `_render_points`'s
  matplotlib branch; `_render_points` now calls it. Byte-identical output
  (verified vs main on categorical and continuous point renders). This is the
  reuse seam the fast mode will draw through.

Phase 1 — centroid + caching core (headless, fully unit-tested):
- `_compute_element_centroids` / `_compute_label_centroids`: per-instance
  centroids in a coordinate system. Shapes use spatialdata's vectorized
  `get_centroids`; labels use skimage `regionprops` (the per-label reduction is
  orders of magnitude faster than `get_centroids` on rasters), mapped onto the
  raster's intrinsic coordinate arrays so it reproduces `get_centroids` exactly
  (incl. the pixel-center 0.5 offset) then transformed to the target CS.
- `_get_or_compute_centroids`: reuses/persists centroids via the squidpy
  convention. A pre-existing `obsm["spatial"]` (loader/user-provided) is trusted
  as the cells' locations; otherwise centroids are computed and written back into
  the annotating table's `obsm["spatial"]` with a coordinate-system provenance
  marker in `uns`, so later renders are instant. Reads run before writes, so a
  valid existing cache is reused rather than clobbered; an incompatible existing
  `obsm["spatial"]` is never overwritten; the cache is invalidated when the
  requested coordinate system differs.
- Tests: shapes/labels centroids match `get_centroids`; cache round-trip +
  provenance; CS invalidation; pre-existing obsm trusted; no-table compute path;
  `cache=False` writes nothing.
… provenance

Cleanup from /simplify (no behavioral change):
- Extract `_region_mask_and_keys(table, element)` used by both read and write,
  removing the duplicated `get_table_keys` + O(n_obs) `region_key`-string-cast
  mask that was computed twice per cold call.
- Read path: validate shape on the raw obsm array and cast only the masked
  subset to float, instead of casting the whole `obsm["spatial"]` on every
  cache hit (the hot path).
- Write path: coerce a non-dict `uns["spatialdata_plot"]` instead of early
  returning after `obsm` was already mutated, so obsm and the provenance marker
  are always written together (no half-write).
- Drop the dead `"key"` provenance field (constant, never read back).
- Rename the misleading `table` local (held a table *name*) in
  `_get_or_compute_centroids`.
Refactor the centroid cache to store element-*intrinsic* coordinates and
transform to the render coordinate system on demand, instead of caching
coords already mapped into one coordinate system. Decisions from design pass:

- Intrinsic storage: one `obsm["spatial"]` cache serves every coordinate
  system (proven equivalent to per-CS computation). `_compute_element_centroids`
  returns intrinsic coords (shapes via shapely `.centroid`, labels via
  `regionprops`); `_centroids_to_coordinate_system` maps them to the requested
  CS via the element's transform; `_get_or_compute_centroids` reads/computes
  intrinsic then transforms on return.
- Provenance records `{n, scale_level}` (no coordinate system). Cache is
  invalidated when the region's instance count changes (cells added/removed),
  not on CS change.
- A pre-existing `obsm["spatial"]` (loader/user-provided) is trusted as the
  cells' intrinsic locations and transformed to the render CS.
- Exhaustive model dispatch: shapes and 2D labels supported; other element
  types raise NotImplementedError.
- Labels are reduced at full resolution (scale0).

Drops the now-unused `get_centroids` import; adds `ShapesModel`. Tests updated:
parametrized shapes/labels match `get_centroids`; new coordinate-system-
independence test (one cache, two CS); staleness-by-instance-count; trusted
pre-existing obsm; unsupported-type rejection.
…e import, shared obsm gate

Cleanup from /simplify:
- `_centroids_to_coordinate_system` ran `PointsModel.parse` + a dask
  `transform(...).compute()` round trip on every call, including cache hits
  (~19 ms fixed floor, ~100 ms at 1M cells) — defeating the cache. Replace with
  `to_affine_matrix` + a plain numpy matmul: numerically identical (verified
  against the dask path for multiple coordinate systems), ~80-140x faster, and
  it removes the private-API import `spatialdata._core.operations.transform`
  (a repo non-negotiable) plus the now-unused `PointsModel`.
- Widen `_transformable_raster` -> `_transform_carrier` to accept any element
  (rasters -> scale0, others as-is), dropping the `isinstance` branch in
  `_centroids_to_coordinate_system`.
- Extract `_valid_spatial_obsm(arr, n_obs)` shared by the read and write paths,
  reconciling their previously divergent obsm-shape checks (read accepted >=2
  columns, write required exactly 2) so they cannot drift.
`render_shapes(..., as_points=True)` and `render_labels(..., as_points=True)`
draw one dot per cell at its centroid instead of the full geometry / rasterized
mask — a large speedup when only cell location matters. New `size=` controls the
marker size.

- Shared `_render_centroids_as_points` draws the scatter (via `_scatter_points`)
  and the legend/colorbar. The per-cell color vector is the *same* one the
  geometry/raster path computes (`_set_color_source_vec`), so colors match the
  full rendering exactly; only the apply step (scatter vs patches/imshow) differs.
- Shapes: centroids from shapely `.centroid` of the (filtered) geometry,
  positionally aligned to the color vector, drawn in intrinsic coords via the
  element transform. Labels: centroids from `_get_or_compute_centroids`
  (regionprops, fast) reindexed to `instance_id`. Positions verified identical to
  `sd.get_centroids`.
- `as_points` short-circuits before the geometry/raster path; outline_*, shape
  (shapes) and contour_px, outline_* (labels) are ignored with an info log.
- Default (`as_points=False`) output is byte-identical to main.

Tests: non-visual checks that centroids land exactly on `get_centroids` for both
element types and that outline/shape are ignored without error.

Note: as_points currently always uses the matplotlib scatter backend; routing
through datashader for very large cell counts (and persisting the obsm cache to
the user's object rather than show()'s working copy) are follow-ups.
@timtreis timtreis changed the title Centroid extraction + squidpy obsm["spatial"] caching; shared scatter helper Fast cell rendering: render_shapes/labels(as_points=True) + squidpy centroid caching Jun 8, 2026
@codecov-commenter

codecov-commenter commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.75510% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.45%. Comparing base (34c23b4) to head (4678297).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_plot/pl/utils.py 86.20% 9 Missing and 7 partials ⚠️
src/spatialdata_plot/pl/render.py 92.59% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #703      +/-   ##
==========================================
+ Coverage   75.96%   76.45%   +0.49%     
==========================================
  Files          14       14              
  Lines        4156     4332     +176     
  Branches      964      996      +32     
==========================================
+ Hits         3157     3312     +155     
- Misses        647      664      +17     
- Partials      352      356       +4     
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/basic.py 79.41% <ø> (+0.37%) ⬆️
src/spatialdata_plot/pl/render_params.py 88.97% <100.00%> (+0.27%) ⬆️
src/spatialdata_plot/pl/render.py 87.21% <92.59%> (+0.16%) ⬆️
src/spatialdata_plot/pl/utils.py 68.88% <86.20%> (+1.10%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

timtreis added 2 commits June 8, 2026 18:10
`render_labels(element, as_points=True)` with no color crashed:
`instance_id` (the raster's unique values) includes the background label `0`,
which has no centroid, and the literal/no-color color vector is sized to the
raster (not per-instance), so `ax.scatter` got mismatched `c` vs `x`/`y`.

Drop the background label from the rendered instances and align the per-cell
color: for data-driven color the vector is already per-instance and is subset to
match; for the literal/no-color path it is replaced with one na/literal color
per centroid. Data-driven (categorical/continuous) renders are unchanged and
still land exactly on `get_centroids`.

Adds a regression test for the no-color labels case.
…ator

Replace the `regionprops` reduction in `_compute_label_centroids` with an
additive bincount aggregator that streams the labels raster block by block —
one dask chunk (or bounded numpy row-block) in memory at a time — accumulating
per-label `count`/`sum_x`/`sum_y`. This is what makes the feature usable at
Xenium scale:

- Out-of-core: peak memory is one chunk + O(n_labels) accumulators, NOT the
  whole raster (measured: 9 MB peak streaming a 268 MB mask). `regionprops`
  needs the full array materialized and OOMs on large morphology masks.
- Scales in cell count: 500k+ labels are just array indexing (562k labels in
  ~1.4 s with 13.5 MB of accumulators); `regionprops`' per-label table does not.
- Faster than `regionprops` (~1.3-1.6x) on in-memory rasters.
- Exact across chunk boundaries (additive reduction) — verified numpy ==
  dask-chunked, and identical to `sd.get_centroids`.
- `count` is the cell area, a free by-product (ready for footprint-based dot
  sizing).

Drops the `regionprops_table` import; adds `slices_from_chunks`. Adds a unit
test locking the chunk-exact, out-of-core, area-correct behavior.

Note: the chunk loop is currently sequential; parallelizing the per-chunk
partials (dask map_blocks + tree-reduce) is a future speedup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants