GH-49299: [C++][Parquet] Integer overflow in Parquet dict decoding#49300
Open
pitrou wants to merge 1 commit intoapache:mainfrom
Open
GH-49299: [C++][Parquet] Integer overflow in Parquet dict decoding#49300pitrou wants to merge 1 commit intoapache:mainfrom
pitrou wants to merge 1 commit intoapache:mainfrom
Conversation
Member
Author
|
@github-actions crossbow submit -g cpp |
|
Revision: df01c56 Submitted crossbow builds: ursacomputing/crossbow @ actions-cb1ec6569d |
pitrou
commented
Feb 16, 2026
| PARQUET_THROW_NOT_OK(dictionary_->Resize(dictionary_length_ * sizeof(T), | ||
| /*shrink_to_fit=*/false)); | ||
| PARQUET_THROW_NOT_OK( | ||
| dictionary_->Resize(static_cast<int64_t>(dictionary_length_) * sizeof(T), |
Member
Author
There was a problem hiding this comment.
Note that sizeof(T) is already a size_t, so this one would only make a different on 32-bit systems.
emkornfield
reviewed
Feb 17, 2026
emkornfield
reviewed
Feb 17, 2026
| PARQUET_THROW_NOT_OK(dictionary_->Resize(dictionary_length_ * sizeof(T), | ||
| /*shrink_to_fit=*/false)); | ||
| PARQUET_THROW_NOT_OK( | ||
| dictionary_->Resize(static_cast<int64_t>(dictionary_length_) * sizeof(T), |
Contributor
There was a problem hiding this comment.
should we just change dictionary_length_ to int64_t (not we seem to explicitly cast it to int32_t right above).
Member
Author
There was a problem hiding this comment.
I thought about that, but it would introduce some downcasts to int/int32_t in other places. Upcasts are safer, so I think it's better to keep it a int32_t.
emkornfield
reviewed
Feb 17, 2026
Contributor
emkornfield
left a comment
There was a problem hiding this comment.
A couple of questions and maybe a simplification.
df01c56 to
49e8a40
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Computing the byte size of a buffer of decoded dictionary values in Parquet could lead to integer overflow on a 32-bit multiplication. This does not seem easily exploitable due to another size check in the PLAIN decoder (we only support PLAIN-encoded dictionary values).
What changes are included in this PR?
Do byte size computations in the 64-bit signed integer domain to avoid any overflow issues.
Are these changes tested?
No.
Are there any user-facing changes?
No.