Skip to content

Thread sidecar_origin through the HTTP/fsspec sidecar georef path#3027

Open
brendancol wants to merge 3 commits into
mainfrom
issue-3023
Open

Thread sidecar_origin through the HTTP/fsspec sidecar georef path#3027
brendancol wants to merge 3 commits into
mainfrom
issue-3023

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #3023

What this does

  • Build a sidecar_origin mapping (id(ifd) -> (sidecar bytes, sidecar byte order)) in _cog_http.py and pass it into extract_geo_info_with_overview_inheritance, so the HTTP/fsspec sidecar path matches the eager local and metadata-only readers in _reader.py.
  • Add remote test coverage that pins the observable contract: when the selected overview IFD lives in a sidecar carrying its own GeoKeys, the sidecar's transform and CRS win over the base file's. Tests cover the HTTP eager, HTTP chunked, fsspec chunked, and direct _parse_cog_http_meta paths.

Note on the mechanism

While verifying the original finding I traced where data/byte_order (and therefore sidecar_origin) actually get consumed. extract_geo_info forwards them only to _parse_geokeys, which reads already-materialized values off ifd.entries. IFD tag values, including out-of-line georef arrays, are read eagerly in parse_ifd using each file's own byte order. The sidecar IFDs are parsed with the sidecar's byte order in discover_remote_sidecar/load_sidecar, so their GeoKeys are already correct before extract_geo_info runs.

In other words, with the current eager-parse IFD model the HTTP path did not actually return wrong georeferencing, because the values were parsed against the sidecar bytes upstream. This change is a defensive consistency alignment with the local path rather than a correctness fix on today's code, and it removes the stale "intentionally not threaded" comment so a future change that makes data/byte_order load-bearing (e.g. lazy tag reads) stays correct on both surfaces. The added tests assert the user-visible contract regardless of the internal mechanism.

Backend coverage

GeoTIFF reader change. Exercised on the HTTP eager, HTTP chunked (dask), and fsspec chunked (dask) paths.

Test plan

  • pytest xrspatial/geotiff/tests/integration/test_sidecar.py (93 passed)
  • pytest xrspatial/geotiff/tests -k "http or sidecar or cog or geotags or geo_info" (824 passed, 7 skipped)
  • flake8 clean on the changed files

)

Pass a sidecar_origin mapping into
extract_geo_info_with_overview_inheritance on the HTTP/fsspec sidecar
path in _cog_http.py so it matches the eager local and metadata-only
readers. Add remote test coverage proving a sidecar that carries its
own GeoKeys wins over the base file's georef on the HTTP eager, HTTP
chunked, fsspec chunked, and direct _parse_cog_http_meta paths.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 7, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Thread sidecar_origin through the HTTP/fsspec sidecar georef path

Blockers (must fix before merge)

  • None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_cog_http.py:255-258 -- the new comment claims that "without it a big-endian .ovr with its own georef ... is read with the wrong byte source and returns corrupt georeferencing." That overstates the effect on today's code. extract_geo_info forwards data/byte_order only to _parse_geokeys, which reads already-materialized values off ifd.entries. IFD tag values (including out-of-line georef arrays) are read eagerly in parse_ifd against each file's own byte order, and the sidecar IFDs are parsed with the sidecar header in discover_remote_sidecar. So sidecar_origin does not change the result on the current eager-parse path. The change is a defensive consistency alignment with the local reader, not a live correctness fix. Reword the comment to match (the PR body already states this honestly).

Nits (optional improvements)

  • The four new tests share an identical HTTP-server setup/teardown block. A small fixture could remove the duplication, but the existing mixed-endian tests in this file follow the same inline pattern, so matching them is defensible.

What looks good

  • The fix mirrors the georef_origin construction in _reader.py:295-299 exactly (id -> (bytes, byte_order)), guarded on sidecar is not None, so it is a faithful parallel of the local path.
  • Tests cover the HTTP eager, HTTP chunked, fsspec chunked, and direct _parse_cog_http_meta paths, asserting both transform and CRS. Temp files use issue-scoped, pid+uuid memory keys so parallel runs do not collide.
  • No public API change; README/docs/benchmark updates are not needed.

Checklist

  • Algorithm matches reference -- N/A, no algorithm change
  • All implemented backends produce consistent results -- HTTP/fsspec eager and chunked paths tested
  • NaN handling is correct -- N/A
  • Edge cases are covered by tests -- sidecar-with-geokeys vs base covered
  • Dask chunk boundaries handled correctly -- chunked paths tested
  • No premature materialization or unnecessary copies -- none introduced
  • Benchmark exists or is not needed -- not needed
  • README feature matrix updated -- not applicable
  • Docstrings present and accurate -- comment accuracy flagged above

The previous comment claimed the HTTP path returned corrupt
georeferencing without the mapping. On the current eager-parse IFD
model that is not accurate: extract_geo_info only consumes data /
byte_order via _parse_geokeys, which reads values already materialized
on ifd.entries during parse_ifd. Reword to describe the change as a
consistency alignment with the local reader that also future-proofs
against lazy tag reads.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up): Thread sidecar_origin through the HTTP/fsspec sidecar georef path

Re-reviewed after commit 4fbd09a.

Blockers

  • None.

Suggestions

  • The earlier suggestion on _cog_http.py (the comment overstating that the change fixes corrupt georeferencing) is resolved. The reworded comment now states plainly that the mapping does not change the result on the current eager-parse IFD model and is threaded for symmetry with the local reader and to stay correct if tag reads ever become lazy. Accurate.

Nits

  • The four new tests still share an inline HTTP-server setup/teardown. Left as-is to match the surrounding mixed-endian tests in the same file, which use the same pattern. Not worth a divergent fixture.

What looks good

  • Production change is symmetric with _reader.py, comment now matches the actual mechanism, and the tests pin the observable "sidecar GeoKeys win" contract on the HTTP eager, HTTP chunked, fsspec chunked, and direct _parse_cog_http_meta paths.

No further changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote .tif.ovr sidecars with their own GeoKeys can return wrong georeferencing

1 participant