feat: add StorageAdapter plugin system for third-party storage protocols#1432
feat: add StorageAdapter plugin system for third-party storage protocols#1432kushalbakshi wants to merge 5 commits intodatajoint:masterfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a store config uses an unrecognised protocol, settings.py now queries the adapter registry before raising an error. Registered plugin adapters are validated and default keys are applied; unknown protocols surface a clear message directing users to install a plugin package. Includes new test class TestGetStoreSpecPluginDelegation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The three else-branches in StorageBackend (_create_filesystem, _full_path, get_url) now query the adapter registry before falling back to the built-in behaviour. Registered plugin adapters are called for filesystem construction, path composition, and URL generation; unknown protocols still raise DataJointError. Includes new test class TestStorageBackendPluginDelegation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move imports to top of file and apply ruff-format to satisfy CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
59c47de to
cd93c19
Compare
Review — two simpler alternatives to considerGood work, Kushal. The need is real: third-party packages should be able to register new storage backends without monkey-patching. Before we commit to the ABC approach, I want to explore whether we can achieve the same result with less machinery — or with machinery that also improves the existing code. Background: what StorageBackend actually does on top of fsspecLooking at `storage.py`, there are five protocol-specific concerns:
~80% of I/O already delegates to fsspec directly. The protocol-specific code is concentrated in items 1-4, spread across ~26 `if self.protocol == "..."` branches. fsspec already has its own protocol discovery — `fsspec.filesystem("s3")` auto-discovers S3FileSystem. So the question: what does DataJoint need to add on top of fsspec's own plugin system? Alternative A — Generic fsspec passthroughThe lightest option: make `StorageBackend` accept any fsspec-supported protocol by default. No new ABC, no entry points, no registry. For unknown protocols:
~30 lines of changes. No new module. Works for any fsspec filesystem immediately. Downside: no config validation or custom path/URL logic for unknown protocols. Alternative B — Unified adapter model for ALL protocols (preferred)Route all backends through adapters — including the existing four. The 26 hardcoded `if self.protocol == "..."` branches become adapter implementations: ```python ``` Built-in adapters (File, S3, GCS, Azure) registered at import time. Third-party adapters discovered via entry points. `StorageBackend` becomes a thin dispatcher — one code path for all protocols. This eliminates the 26 hardcoded protocol branches, makes third-party backends first-class citizens alongside built-ins, and gives us the right abstraction. The user-facing API (`dj.config.stores`) doesn't change at all — this is purely internal refactoring. We test against the existing integration test suite and release within the 2.2.x series when confident. AssessmentYour current PR has the right instincts — ABC, entry points, lazy discovery. But it creates a two-tier system: hardcoded paths for built-in protocols, adapter path for everything else. That means maintaining both code paths going forward. I'd prefer Alternative B: if we're introducing the adapter abstraction, route everything through it. The existing integration tests validate that the built-in adapters behave correctly. Since the user API doesn't change, this is safe to land in 2.2.x with thorough testing. @kushalbakshi — would you be up for reworking this PR to route the four existing protocols through adapters as well? The `StorageAdapter` ABC and entry-point discovery from your PR are the right foundation; the additional work is extracting the existing protocol-specific logic from `StorageBackend` into adapter subclasses. |
Summary
Adds a
StorageAdapterentry-point plugin system that allows third-party packages to register new storage protocols without modifying DataJoint internals.StorageAdapterABC insrc/datajoint/storage_adapter.pywith 4 extension points:create_filesystem,validate_spec,full_path,get_urldatajoint.storagegroup (mirrors the existing codec plugin pattern incodecs.py)settings.py(get_store_spec) andstorage.py(_create_filesystem,_full_path,get_url)file,s3,gcs,azure)Motivation
DataJoint's external storage layer currently supports four hardcoded protocols. Extending it with custom fsspec-backed storage requires monkey-patching private methods — fragile, non-composable, and breaks on upstream refactors. This PR adds a clean plugin system so third-party packages can register storage protocols via standard Python entry points.
How third-party packages use it
No import needed — DataJoint auto-discovers the adapter when it encounters the protocol in
dj.config.stores.Test plan
🤖 Generated with Claude Code