Thread sidecar_origin through the HTTP/fsspec sidecar georef path#3027
Open
brendancol wants to merge 3 commits into
Open
Thread sidecar_origin through the HTTP/fsspec sidecar georef path#3027brendancol wants to merge 3 commits into
brendancol wants to merge 3 commits into
Conversation
) Pass a sidecar_origin mapping into extract_geo_info_with_overview_inheritance on the HTTP/fsspec sidecar path in _cog_http.py so it matches the eager local and metadata-only readers. Add remote test coverage proving a sidecar that carries its own GeoKeys wins over the base file's georef on the HTTP eager, HTTP chunked, fsspec chunked, and direct _parse_cog_http_meta paths.
brendancol
commented
Jun 7, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Thread sidecar_origin through the HTTP/fsspec sidecar georef path
Blockers (must fix before merge)
- None.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/_cog_http.py:255-258-- the new comment claims that "without it a big-endian.ovrwith its own georef ... is read with the wrong byte source and returns corrupt georeferencing." That overstates the effect on today's code.extract_geo_infoforwardsdata/byte_orderonly to_parse_geokeys, which reads already-materialized values offifd.entries. IFD tag values (including out-of-line georef arrays) are read eagerly inparse_ifdagainst each file's own byte order, and the sidecar IFDs are parsed with the sidecar header indiscover_remote_sidecar. Sosidecar_origindoes not change the result on the current eager-parse path. The change is a defensive consistency alignment with the local reader, not a live correctness fix. Reword the comment to match (the PR body already states this honestly).
Nits (optional improvements)
- The four new tests share an identical HTTP-server setup/teardown block. A small fixture could remove the duplication, but the existing mixed-endian tests in this file follow the same inline pattern, so matching them is defensible.
What looks good
- The fix mirrors the
georef_originconstruction in_reader.py:295-299exactly (id -> (bytes, byte_order)), guarded onsidecar is not None, so it is a faithful parallel of the local path. - Tests cover the HTTP eager, HTTP chunked, fsspec chunked, and direct
_parse_cog_http_metapaths, asserting both transform and CRS. Temp files use issue-scoped, pid+uuid memory keys so parallel runs do not collide. - No public API change; README/docs/benchmark updates are not needed.
Checklist
- Algorithm matches reference -- N/A, no algorithm change
- All implemented backends produce consistent results -- HTTP/fsspec eager and chunked paths tested
- NaN handling is correct -- N/A
- Edge cases are covered by tests -- sidecar-with-geokeys vs base covered
- Dask chunk boundaries handled correctly -- chunked paths tested
- No premature materialization or unnecessary copies -- none introduced
- Benchmark exists or is not needed -- not needed
- README feature matrix updated -- not applicable
- Docstrings present and accurate -- comment accuracy flagged above
The previous comment claimed the HTTP path returned corrupt georeferencing without the mapping. On the current eager-parse IFD model that is not accurate: extract_geo_info only consumes data / byte_order via _parse_geokeys, which reads values already materialized on ifd.entries during parse_ifd. Reword to describe the change as a consistency alignment with the local reader that also future-proofs against lazy tag reads.
brendancol
commented
Jun 7, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up): Thread sidecar_origin through the HTTP/fsspec sidecar georef path
Re-reviewed after commit 4fbd09a.
Blockers
- None.
Suggestions
- The earlier suggestion on
_cog_http.py(the comment overstating that the change fixes corrupt georeferencing) is resolved. The reworded comment now states plainly that the mapping does not change the result on the current eager-parse IFD model and is threaded for symmetry with the local reader and to stay correct if tag reads ever become lazy. Accurate.
Nits
- The four new tests still share an inline HTTP-server setup/teardown. Left as-is to match the surrounding mixed-endian tests in the same file, which use the same pattern. Not worth a divergent fixture.
What looks good
- Production change is symmetric with
_reader.py, comment now matches the actual mechanism, and the tests pin the observable "sidecar GeoKeys win" contract on the HTTP eager, HTTP chunked, fsspec chunked, and direct_parse_cog_http_metapaths.
No further changes requested.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3023
What this does
sidecar_originmapping (id(ifd)->(sidecar bytes, sidecar byte order)) in_cog_http.pyand pass it intoextract_geo_info_with_overview_inheritance, so the HTTP/fsspec sidecar path matches the eager local and metadata-only readers in_reader.py._parse_cog_http_metapaths.Note on the mechanism
While verifying the original finding I traced where
data/byte_order(and thereforesidecar_origin) actually get consumed.extract_geo_infoforwards them only to_parse_geokeys, which reads already-materialized values offifd.entries. IFD tag values, including out-of-line georef arrays, are read eagerly inparse_ifdusing each file's own byte order. The sidecar IFDs are parsed with the sidecar's byte order indiscover_remote_sidecar/load_sidecar, so their GeoKeys are already correct beforeextract_geo_inforuns.In other words, with the current eager-parse IFD model the HTTP path did not actually return wrong georeferencing, because the values were parsed against the sidecar bytes upstream. This change is a defensive consistency alignment with the local path rather than a correctness fix on today's code, and it removes the stale "intentionally not threaded" comment so a future change that makes
data/byte_orderload-bearing (e.g. lazy tag reads) stays correct on both surfaces. The added tests assert the user-visible contract regardless of the internal mechanism.Backend coverage
GeoTIFF reader change. Exercised on the HTTP eager, HTTP chunked (dask), and fsspec chunked (dask) paths.
Test plan
pytest xrspatial/geotiff/tests/integration/test_sidecar.py(93 passed)pytest xrspatial/geotiff/tests -k "http or sidecar or cog or geotags or geo_info"(824 passed, 7 skipped)flake8clean on the changed files