Skip to content

Refactor: Use consistent URL representation for all storage paths (including file://) #1326

@dimitri-yatsenko

Description

@dimitri-yatsenko

Summary

Following review feedback on PR #1311, we should refactor the storage layer to use consistent URL representation for all data sources, including local files.

Current Behavior

  • Remote paths use URLs: s3://bucket/path, gs://bucket/path
  • Local paths use raw filesystem paths: /path/to/file
  • is_remote_url() function distinguishes between the two

Proposed Change

  1. Accept both formats from users: /path/to/file and file:///path/to/file
  2. Normalize to URLs internally: Convert all paths to URL format (file:// for local)
  3. Store URLs consistently in the database
  4. Leverage fsspec uniformity: fsspec already treats all backends (including local) uniformly via URLs

Benefits

  • Coherent internal representation
  • Simpler codebase - no special-casing for local vs remote
  • Better alignment with fsspec's design philosophy
  • Avoids potential bugs from inconsistent handling

Implementation Notes

  • Add file:// to REMOTE_PROTOCOLS (or rename to URL_PROTOCOLS)
  • Create helper to normalize user paths to URLs
  • Update StorageBackend to work with URLs consistently
  • Ensure backward compatibility for existing stored paths

References

Metadata

Metadata

Labels

enhancementIndicates new improvements

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions