Enhance URI scheme validation for Windows paths#3161
Enhance URI scheme validation for Windows paths#3161AlgoDeveloper400 wants to merge 2 commits intoapache:mainfrom
Conversation
|
@AlgoDeveloper400 can you fix the linter? I think it is reasonable to add this, but we might break it easily since there are no tests to enforce the behavior |
|
Okay I will fix the linter, |
|
@Fokko All the tests are successful and the linter issue has been resolved. |
| super().__init__(properties=properties) | ||
|
|
||
| @staticmethod | ||
| def parse_location(location: str, properties: Properties = EMPTY_DICT) -> tuple[str, str, str]: |
There was a problem hiding this comment.
We could still add against parse_location here for the drive something that can ensure we are treating drive letters as local paths and not uri schemes.
How about something like this with the windows paths as parameters
def test_parse_location_windows_drive_letter() -> None:
loc = "C:\\Users\\test\\file.avro"
scheme, netloc, _ = PyArrowFileIO.parse_location(loc)
assert scheme == "file"
assert netloc == ""
| # len == 1 and alpha catches Windows drive letters like C:\ D:\ | ||
| default_scheme = properties.get("DEFAULT_SCHEME", "file") | ||
| default_netloc = properties.get("DEFAULT_NETLOC", "") | ||
| return default_scheme, default_netloc, os.path.abspath(location) |
There was a problem hiding this comment.
Looks like a few other places use the urlparse and will be affected like fs spec and schema resolution.
iceberg-python/pyiceberg/io/__init__.py
Lines 337 to 346 in 3a993e8
Maybe we could create a helper in the init class that all modules could use.
fix: handle Windows drive letters in
parse_locationRationale for this change
When a Windows user passes a local file path like
C:\Users\file.avrotoPyArrowFileIO,Python's
urlparseincorrectly treats the Windows drive letterCas a URL scheme (likes3orhttp).This caused PyIceberg to crash with:
The Fix
Before ❌ (Original Code):
After ✅ (Fixed Code):
The only change:
The added condition checks if the scheme is a single alphabetic character (e.g.
C,D,E)and treats it as a Windows drive letter instead of a URL scheme.
Example
Impact
This fix affects all local file operations on Windows including:
Are these changes tested?
Yes - existing tests now pass on Windows.
tests/test_avro_sanitization.pytests/io/test_pyarrow.pyAre there any user-facing changes?
Yes - fixes local file access on Windows for all PyIceberg users.