-
Notifications
You must be signed in to change notification settings - Fork 5
Design URI structure for data.isamples.org #81
Description
Context
data.isamples.org currently serves parquet files flat at the root (e.g., data.isamples.org/isamples_202601_wide.parquet). Ben Norton suggests adding path segments that convey resource type, following OGC-style patterns:
data.isamples.org/parquet/isamples_202601_wide.parquet # data files
data.isamples.org/record/<uuid> # individual sample records
data.isamples.org/term/<term-slug> # vocabulary terms
"This allows you to better manage resources, provides additional context and informs the user what type of resource a pid is expected to return. This pattern is also part of several specifications (i.e. OGC)."
— Ben Norton
Relevant specifications
- OGC API - Records —
/collections/{id}/items/{recordId} - OGC API - Features — similar hierarchical pattern
- W3C/TDWG patterns for biodiversity term URIs
- Cool URIs for the Semantic Web
Questions to discuss
-
Scope vs. timeline — The grant ends July 2026. Which path segments are realistic to implement?
/parquet/for data files: trivial (Worker routing change + redirects from old flat paths)/term/for vocabulary: moderate (could redirect to existing vocab pages on the site)/record/<uuid>for individual samples: heavy (requires a query service, not just static files)
-
Backwards compatibility — PR Use data.isamples.org for all parquet file URLs #79 just migrated all references to flat URLs. If we restructure, we'd want redirects from the old paths.
-
Content negotiation — Should
/record/<uuid>return JSON-LD vs HTML based on Accept header? That's the full linked-data pattern but adds complexity. -
Versioning — Current files are date-stamped (
202601). Should the URI structure make versioning explicit (e.g.,/parquet/v202601/wide.parquet)?
Current state
- Cloudflare Worker proxies R2 bucket with range requests + CORS
- 9 parquet files, ~1.48 GB total (see Use data.isamples.org for all parquet file URLs #79 for full index)
- All tutorials and Explorer reference flat URLs as of Use data.isamples.org for all parquet file URLs #79