Skip to content

GH-49299: [C++][Parquet] Integer overflow in Parquet dict decoding#49300

Open
pitrou wants to merge 1 commit intoapache:mainfrom
pitrou:pq-int-overflow
Open

GH-49299: [C++][Parquet] Integer overflow in Parquet dict decoding#49300
pitrou wants to merge 1 commit intoapache:mainfrom
pitrou:pq-int-overflow

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Feb 16, 2026

Rationale for this change

Computing the byte size of a buffer of decoded dictionary values in Parquet could lead to integer overflow on a 32-bit multiplication. This does not seem easily exploitable due to another size check in the PLAIN decoder (we only support PLAIN-encoded dictionary values).

What changes are included in this PR?

Do byte size computations in the 64-bit signed integer domain to avoid any overflow issues.

Are these changes tested?

No.

Are there any user-facing changes?

No.

@pitrou pitrou added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Feb 16, 2026
@pitrou
Copy link
Member Author

pitrou commented Feb 16, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: df01c56

Submitted crossbow builds: ursacomputing/crossbow @ actions-cb1ec6569d

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

PARQUET_THROW_NOT_OK(dictionary_->Resize(dictionary_length_ * sizeof(T),
/*shrink_to_fit=*/false));
PARQUET_THROW_NOT_OK(
dictionary_->Resize(static_cast<int64_t>(dictionary_length_) * sizeof(T),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that sizeof(T) is already a size_t, so this one would only make a different on 32-bit systems.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 16, 2026
@pitrou pitrou marked this pull request as ready for review February 16, 2026 17:26
@pitrou pitrou requested a review from wgtmac as a code owner February 16, 2026 17:26
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 17, 2026
PARQUET_THROW_NOT_OK(dictionary_->Resize(dictionary_length_ * sizeof(T),
/*shrink_to_fit=*/false));
PARQUET_THROW_NOT_OK(
dictionary_->Resize(static_cast<int64_t>(dictionary_length_) * sizeof(T),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just change dictionary_length_ to int64_t (not we seem to explicitly cast it to int32_t right above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but it would introduce some downcasts to int/int32_t in other places. Upcasts are safer, so I think it's better to keep it a int32_t.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions and maybe a simplification.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting change review Awaiting change review Component: C++ Component: Parquet Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants