Add persistent sqlite3-backed API metadata cache#1815
Add persistent sqlite3-backed API metadata cache#1815bendichter wants to merge 2 commits intomasterfrom
Conversation
Introduce opt-in caching for DandiAPIClient(cache=True) that persists metadata responses to a local sqlite3 database, validated against modified timestamps to avoid serving stale data without extra API calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1815 +/- ##
==========================================
+ Coverage 75.11% 75.14% +0.02%
==========================================
Files 84 86 +2
Lines 11925 12050 +125
==========================================
+ Hits 8958 9055 +97
- Misses 2967 2995 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| ``dandi.apicache`` | ||
| ================== | ||
|
|
||
| This module provides a persistent, sqlite3-backed cache for metadata returned |
There was a problem hiding this comment.
does it work ok on large ones like 000026? I just wonder that may be sqlite3 can quickly become inefficient for some uses. E.g. how large it would become after listing all dandisets and would it still perform ok? as should there be some per-dandiset cache may be instead?
| except HTTP404Error: | ||
| raise NotFoundError(f"No such asset: {self}") | ||
| cache = self.client.cache | ||
| modified = self.modified.isoformat() |
There was a problem hiding this comment.
frankly I am not sure yet if we can rely on modified on per asset, see e.g. a random first hit
- zarr might change without having "modified" adjusted for it dandi-archive#1432
but also can dig deeper to still open - Dandiset "modified" timestamp not updated when entries deleted from Zarr dandi-archive#1871
etc :-/
and we had touched on those prior in
Summary
APIMetadataCacheclass indandi/apicache.py— a persistent sqlite3-backed cache for API metadata responses, keyed by(api_url, entity_type, entity_id)and validated againstmodifiedtimestampscache: bool = Falseparameter toDandiAPIClient.__init__()to opt in to cachingRemoteDandiset.get_raw_metadata()andBaseRemoteAsset.get_raw_metadata()— no extra API calls needed for staleness checksDANDI_CACHEenv var ("ignore"disables,"clear"wipes)apicache.rst), caching section indandiapi.rst, and updatedDEVELOPMENT.mdTest plan
python -m pytest dandi/tests/test_apicache.py -v— 8 unit tests covering miss, hit, staleness, update, clear, different entity types,DANDI_CACHE=ignore, andDANDI_CACHE=clearpre-commit run --files dandi/apicache.py dandi/dandiapi.py dandi/tests/test_apicache.py— all checks passpython -c "from dandi.dandiapi import DandiAPIClient; c = DandiAPIClient('https://api.dandiarchive.org/api', cache=True); ds = c.get_dandiset('000027'); print(ds.get_raw_metadata()['name'])"— run twice, second should be faster🤖 Generated with Claude Code