Skip to content

First cut at adding some parallelism in pyfive#209

Draft
bnlawrence wants to merge 2 commits intomainfrom
parallel
Draft

First cut at adding some parallelism in pyfive#209
bnlawrence wants to merge 2 commits intomainfrom
parallel

Conversation

@bnlawrence
Copy link
Collaborator

@bnlawrence bnlawrence commented Mar 25, 2026

Description

This pull request addresses issue: #208 by introducing a new mixin class for parallel access to chunks.

Checklist

  • This pull request has a descriptive title and labels
  • This pull request has a minimal description (most was discussed in the issue, but a two-liner description is still desirable)
  • Unit tests have been added (if codecov test fails)
  • Any changed dependencies have been added or removed correctly (if need be)
  • If you are working on the documentation, please ensure the current build passes
  • All tests pass

@bnlawrence bnlawrence requested a review from zequihg50 March 25, 2026 13:59
@zequihg50
Copy link
Contributor

My two main concerns with this implementation are:

  • fsspec cat_ranges - I’m not fully convinced about the efficiency of this approach. From the source, it appears to return a list of bytes objects to the caller, which likely introduces unnecessary allocations and copies. I think the rationale of this is to leverage fsspec’s concurrency (e.g., via an async HTTPStore), but I suspect this may still introduce blocking at some point in the pipeline. A lower-level approach might be more efficient, for example, working with memoryview and implementing custom concurrency. That said, this comes at the cost of increased implementation complexity...

  • Use of inheritance - I think this design would benefit from composition over inheritance. In particular, it feels unintuitive for a Dataset to inherit from ChunkRead, especially since HDF5 datasets can also be contiguous. A structure where something like ChunkedDataset extends Dataset (or composes chunk-reading behavior) would likely be more appropriate. More broadly, the API of this is far from trivial I would say, since this design could/should support different concurrency models (e.g., async vs threads).

@bnlawrence
Copy link
Collaborator Author

  • fsspec cat_ranges - I’m not fully convinced about the efficiency of this approach. From the source, it appears to return a list of bytes objects to the caller, which likely introduces unnecessary allocations and copies. I think the rationale of this is to leverage fsspec’s concurrency (e.g., via an async HTTPStore), but I suspect this may still introduce blocking at some point in the pipeline. A lower-level approach might be more efficient, for example, working with memoryview and implementing custom concurrency. That said, this comes at the cost of increased implementation complexity...

It does seem to be the only way to safely exploit asyncio in the context of fsspec, but yes, there is a lot to investigate. I've not actually tested this in anger yet.

@bnlawrence
Copy link
Collaborator Author

  • Use of inheritance - I think this design would benefit from composition over inheritance. In particular, it feels unintuitive for a Dataset to inherit from ChunkRead, especially since HDF5 datasets can also be contiguous. A structure where something like ChunkedDataset extends Dataset (or composes chunk-reading behavior) would likely be more appropriate. More broadly, the API of this is far from trivial I would say, since this design could/should support different concurrency models (e.g., async vs threads).

Yes, I agree. I started out by wanting to do this. I am not quite sure how I ended up with this. I'm minded to persevere with it until we have sorted the performance issues out, then refactor it.

@bnlawrence
Copy link
Collaborator Author

The other thing where we may get benefit is threading around the uncompress, which should also be embarassingly parallel, and certainly not optimal for async. You'll note that at the moment that's still serial ...

@zequihg50
Copy link
Contributor

zequihg50 commented Mar 26, 2026

The other thing where we may get benefit is threading around the uncompress, which should also be embarassingly parallel, and certainly not optimal for async. You'll note that at the moment that's still serial ...

Absolutely, just as a remainder for my future self, this might good to implement using two thread pools or async threads that communicate via some queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants