feat: unified chunk grid with rectilinear chunk/shard support#3802
Open
maxrjones wants to merge 114 commits intozarr-developers:mainfrom
Open
feat: unified chunk grid with rectilinear chunk/shard support#3802maxrjones wants to merge 114 commits intozarr-developers:mainfrom
maxrjones wants to merge 114 commits intozarr-developers:mainfrom
Conversation
This reverts commit 9c0f582.
Member
Author
|
I added a real-world example based on the data that @tinaok shared in #3534 at https://zarr--3802.org.readthedocs.build/en/3802/user-guide/examples/rectilinear_chunks/. I could adjust the source branches for the juv-compliant notebook before we merge. @d-v-b would you have any time for a review on this PR? |
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
| return _shards_out, _chunks_out | ||
|
|
||
|
|
||
| class _RegularChunkGridMeta(type): |
Contributor
There was a problem hiding this comment.
what's this for? I'm guessing it would not be necessary if we had a clear abstraction boundary between the in-memory ChunkGrid and the metadata chunk grids.
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
d-v-b
reviewed
Mar 30, 2026
…metadata (#7) * chore: simplify sharding codec validation against varying chunk grid metadata * test: restore test strength
| | Fully variable | Arbitrary per-chunk sizes | `[5, 12, 3, 20]` | | ||
|
|
||
| A registry-based plugin system adds complexity without clear benefit. | ||
| Prior iterations on the chunk grid design were based on the Zarr V3 spec's definition of chunk grids as an extension point alongside codecs, dtypes, etc. Therefore, we started designing the chunk grid implementation following a similar registry based approach. However, in practice chunk grids are fundamentally different than codecs. Codecs are independent; supporting `zstd` tells you nothing about `gzip`. Chunk grids are not: every regular grid is a valid rectilinear grid. A registry-based plugin system makes sense for codecs but adds complexity without clear benefit for chunk grids. Here we start from some basic goals and propose a more fitting design for supporting different chunk grids in zarr-python. |
There was a problem hiding this comment.
👍🏽
Nit: registry based --> registry-based
…chunk grid (#8) * refactor: allow regular-style chunk grid declaration for rectilinear chunk grid The rectilinear chunk grid spec allows bare integers per dimension (meaning "regular step size"), distinct from explicit single-element edge lists. This commit widens `RectilinearChunkGrid.chunk_shapes` to `tuple[int | tuple[int, ...], ...]` so bare ints are preserved for faithful JSON round-tripping. Additionally: - unifies `_validate_chunk_shapes` to handle both regular and rectilinear validation; `_parse_chunk_shape` now delegates to it - adds `from_sizes` method to `ChunkGrid`, accepting `int | Sequence[int]` per dimension - removes `from_regular` and `from_rectilinear` methods from `ChunkGrid` - removes `parse_chunk_grid` from `chunk_grids.py` (JSON → ChunkGrid shortcut that bypassed the metadata layer) - removes `serialize_chunk_grid`, `_infer_chunk_grid_name`, and serialization helpers from `chunk_grids.py` (ChunkGrid never needs to be serialized; metadata DTOs handle it) - renames `parse_chunk_grid` in `v3.py` to `parse_chunk_grid_metadata` to disambiguate - moves the rectilinear feature flag to `RectilinearChunkGrid.__post_init__` - simplifies sharding codec validation into a single divisibility check for both regular and rectilinear grids - updates `validate_rectilinear_edges` to skip bare-int dimensions - refactors chunk grid tests to functional style with parametrization - adds docstrings to all test functions * chore: remove .claude * refactor: rename chunk_grid parsing function --------- Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR contains an alternative implementation of the rectilinear chunk grid extension, building on the work in #3534 (RLE helpers, validation logic, and test cases were directly adopted). While the core feature of variable-sized chunks is the same, the internal architecture differs in ways that impact extensibility, performance, and release safety.
I appreciate the patience of those who contributed to #3534, and everyone who's been waiting on this feature. I know it's frustrating to see a new PR after #3534 was so close. That PR provided fundamental components, and I hope people will see the value here. I really believe it is worth the churn for the following reasons:
Key differences from #3534
DimensionGridprotocol (FixedDimension,VaryingDimension). Adding a new dimension type (e.g.TiledDimensionfor periodic patterns like days-per-month) requires implementing that protocol — no changes to indexing, codecs, or theChunkGridclass. A prototype was built to verify this.VaryingDimensionuses precomputed prefix sums for O(log n) lookups via binary search. See https://github.com/maxrjones/zarr-chunk-grid-tests for a performance comparison.zarr.config.set({'array.rectilinear_chunks': True})(orZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives downstream libraries time to adapt before the API is finalized, and us an opportunity to gracefully finalize the API.Design document:
docs/design/chunk-grid.mdcovers the full design, rationale, and a suggested PR sequence for splitting this into reviewable increments, if needed.Downstream POCs (all passing):
TODO:
docs/user-guide/*.mdchanges/