Fix PBF block offsets on Windows#894
Open
Symmetricity wants to merge 1 commit into
Open
Conversation
PBF block offsets were stored as long int after casting from tellg(). On Windows, long remains 32-bit even in 64-bit builds, so offsets past 2GB can be truncated before later seekg() calls reread the block. Store the offset as std::streamoff instead, matching the stream offset type used by tellg()/seekg(). This keeps the existing block metadata flow but avoids narrowing large PBF positions. This matches the failure pattern reported in systemed#776, where Windows builds crash when reading extracts just over the signed 32-bit boundary. Checked with make tilemaker -j2, make test_pbf_reader -j2, cmake --build build --target tilemaker -j2, and a MinGW Windows build smoke-tested with wine --help.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is AI generated.
Summary
This fixes a Windows-specific large-PBF crash by storing PBF block offsets as
std::streamoffinstead oflong int.The current code records each PBF data block offset like this:
blocks[blocks.size()] = { (long int)infile->tellg(), bh.datasize, true, true, true, 0, 1 };On Windows,
longremains 32-bit in 64-bit builds. Once an.osm.pbfis largeenough that a block payload offset exceeds the signed 32-bit range, that cast
can truncate the offset. Later reads then seek to the wrong byte position and
parse unrelated data as a PBF blob/primitive block.
This changes the stored offset type to
std::streamoffand caststellg()tothat type at the point the offset is recorded.
Background
This matches #776, which reports Windows crashes around the 2GB boundary:
us-midwestat2,129,162,240bytes worked.britain-and-irelandat2,185,531,392bytes crashed.Issue: #776
Microsoft documents
longas 4 bytes for both x86 and x64 MSVC targets:https://learn.microsoft.com/en-us/cpp/cpp/data-type-ranges
That makes this failure plausible on 64-bit Windows even though the same source
code can work on LP64 Unix-like systems where
longis commonly 64-bit.Implementation
The patch is intentionally small:
<ios>ininclude/pbf_processor.h.BlockMetadata::offsetfromlong inttostd::streamoff.static_cast<std::streamoff>(infile->tellg()).No PBF parsing logic, scheduling logic, or block ordering behavior is changed.
Expected Behavior
Before this change, a 64-bit Windows build can truncate offsets once block
payload positions pass the signed 32-bit boundary. Depending on the truncated
position, the later read can fail as a silent crash, a protozero parse
exception, or a generic tilemaker failure.
After this change, tilemaker keeps the stream offset width instead of narrowing
to Windows
long, so laterseekg()calls can return to the intended blockpositions in large files.
Possible Regressions
The intended behavior change is limited to platforms where
long intis toosmall for large file offsets.
On platforms where
long intwas already large enough, this should preserve thesame values while using a more appropriate stream offset type.
On 32-bit builds, this does not make the rest of the process capable of handling
arbitrarily large PBFs. It avoids the explicit
long intnarrowing inBlockMetadata, but the practical file-size limit still depends on the C++library, OS large-file support, address space, and tilemaker's memory/storage
requirements.
Related Issues And PRs
std::streamoffchange.
Testing
Code checks:
Earlier local verification on this branch:
The branch was also built as a MinGW Windows binary during local investigation
for Windows large-PBF testing.