OceanDataStore is an open-source Python library for creating, publishing, discovering, and accessing cloud-native ocean datasets.
OceanDataStore enables ocean modelling and observational communities to work with Analysis-Ready, Cloud-Optimised (ARCO) datasets stored in object storage, including a:
-
Command Line Interface (CLI) to convert traditional ocean datasets into scalable cloud formats such as Zarr stores and Icechunk repositories.
-
Intuitive OceanDataCatalog API to discover and access datasets using Spatio-Temporal Asset Catalog (STAC) metadata.
Traditional ocean datasets are often distributed as collections of thousands of NetCDF files stored on HPC systems or remote archives. Accessing these datasets can require substantial data transfers, complex file management, and bespoke workflows.
OceanDataStore adopts a cloud-native approach where datasets are stored in ARCO formats and described through a searchable STAC catalogue.
This enables users to:
- Access only the variables, time periods, and spatial domains required for analysis.
- Open datasets directly as
xarray.Datasetobjects without downloading complete archives. - Work seamlessly with the scientific Python ecosystem, including xarray, dask, and zarr.
- Build scalable, reproducible workflows for ocean science.
🌊 Discover and access ocean datasets with OceanDataCatalog
- Search STAC catalogs for available ocean model and observational datasets.
- Explore dataset metadata and available variables.
- Open cloud-hosted datasets directly as lazy
xarray.Datasetobjects.
☁️ Create and publish ARCO ocean datasets
- Convert collections of NetCDF files into cloud-native Zarr datasets.
- Write directly to S3-compatible object storage.
- Use Dask for parallel processing of large simulations and observational products.
🔄 Support reproducible ocean data workflows
- Integrate ocean model output and observations through a common access interface.
- Develop scalable model validation workflows.
- Facilitate FAIR data practices for ocean science.
We recommend downloading and installing OceanDataStore into a new virtual environment via GitHub.
After activating a new virtual environment, pip install OceanDataStore from GitHub:
pip install oceandatastore
Alternatively, users can install OceanDataStore (including the latest commits) via GitHub:
pip install git+https://github.com/NOC-MSM/OceanDataStore.git
1. Create and Publish an ARCO Dataset
OceanDataStore provides command-line tools for converting collections of NetCDF files into Zarr datasets stored in S3-compatible object storage.
For example, a large ocean model simulation can be converted into a cloud-native dataset using:
ods send_to_zarr \
-f /path/to/files*.nc \
-c credentials.json \
-b my_bucket \
-p my_ocean_model \
-cs '{"x": 2160, "y": 1803}' \
-dc dask_config.json \
-zv 3More complete publishing workflows and examples are available in the examples directory and documentation.
2. Discover and Access Ocean Datasets
OceanDataCatalog provides a Python interface for searching, exploring, and opening datasets described by STAC metadata.
from oceandatastore import OceanDataCatalog
# Connect to the NOC STAC catalog:
catalog = OceanDataCatalog(catalog_name="noc-stac")
# Search the catalog:
catalog.search(
collection="noc-npd-era5"
)
# Open a dataset directly as an xarray.Dataset:
ds = catalog.open_dataset(
id="noc-npd-era5/npd-eorca1-era5v1/r1i1c1f1/gn/T1m",
variable_names=["tos_con"],
start_datetime="2004-01",
end_datetime="2008-12",
)Since datasets are opened lazily using xarray and dask, analyses can scale from a laptop to HPC and cloud environments.
OceanDataStore supports a broad range of ocean science workflows, including:
Ocean Model Validation
-
Compare ocean simulations against observational products using a common data access pattern.
-
Build reproducible evaluation workflows across multiple models and experiments.
Cloud-Native Model Archives
- Publish large-scale ocean simulations as FAIR, discoverable datasets without sharing raw file archives.
Ocean Observations
- Access observational products alongside model output through a single catalog interface.
Documentation, examples, and API references are available here
OceanDataStore is under active development and we welcome feedback and contributions from the ocean modelling, observational, and wider marine data communities.
The ongoing development of OceanDataStore is funded by the following projects:
Ollie Tooth (oliver.tooth@noc.ac.uk)