GH-50077: [C++][IPC] Avoid int64 overflow in ReadSparseCSXIndex by jmestwa-coder · Pull Request #50038 · apache/arrow

jmestwa-coder · 2026-05-25T17:47:57Z

Rationale for this change

ReadSparseCSXIndex validates the SparseTensor indices/indptr buffer sizes against the claimed shape using int64 products (non_zero_length * byte_width and (shape[axis] + 1) * byte_width). Both inputs come unchecked from the flatbuffer via GetSparseTensorMetadata, so a value near INT64_MAX overflows the signed product (UBSan-confirmed), wraps to a small value, and the size guard passes. The index Tensor is then built over a buffer smaller than its shape, enabling an out-of-bounds read.

What changes are included in this PR?

Compute the indices/indptr byte counts (and the shape[axis] + 1 term) with MultiplyWithOverflow/AddWithOverflow, the same checked helpers already used for this size math in tensor.cc, and return Status::Invalid when the computation overflows.

Are these changes tested?

Covered by the existing sparse tensor IPC round-trip tests; the change only adds an overflow guard on the existing validation path.

Are there any user-facing changes?

No.

GitHub Issue: [C++][IPC] Possible int64 overflow in ReadSparseCSXIndex buffer size validation #50077

kou · 2026-05-25T20:46:47Z

Could you open an issue for non MINOR change? See https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes for details.
Could you use our PR template instead of removing it entirely?

jmestwa-coder · 2026-06-02T07:25:47Z

Done. Opened #50077 and retitled the PR to reference it, and restored the PR template in the description.

github-actions · 2026-06-02T07:26:11Z

⚠️ GitHub issue #50077 has been automatically assigned in GitHub to PR creator.

Copilot

Pull request overview

This PR hardens the C++ IPC sparse tensor reader by preventing signed int64_t overflow when validating CSX (CSR/CSC) index buffer sizes in ReadSparseCSXIndex, closing an out-of-bounds read vector when parsing crafted IPC messages.

Changes:

Use internal::MultiplyWithOverflow to compute indices/indptr minimum byte sizes safely.
Use internal::AddWithOverflow to compute (shape[axis] + 1) safely for indptr length derivation.
Return Status::Invalid when overflow is detected during size validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -2336,26 +2337,40 @@ Result<std::shared_ptr<SparseIndex>> ReadSparseCSXIndex(
                                     /*allow_short_read=*/false));

  std::vector<int64_t> indices_shape({non_zero_length});


avoid int64 overflow in sparse CSX index bounds checks

c2f3ab6

github-actions Bot added Component: C++ awaiting review Awaiting review labels May 25, 2026

jmestwa-coder changed the title ~~MINOR: [C++][IPC] Avoid int64 overflow in ReadSparseCSXIndex~~ GH-50077: [C++][IPC] Avoid int64 overflow in ReadSparseCSXIndex Jun 2, 2026

kou requested a review from Copilot June 3, 2026 00:17

Copilot started reviewing on behalf of kou June 3, 2026 00:17 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread cpp/src/arrow/ipc/reader.cc

@@ -2336,26 +2337,40 @@ Result<std::shared_ptr<SparseIndex>> ReadSparseCSXIndex(

/*allow_short_read=*/false));

std::vector<int64_t> indices_shape({non_zero_length});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-50077: [C++][IPC] Avoid int64 overflow in ReadSparseCSXIndex#50038

GH-50077: [C++][IPC] Avoid int64 overflow in ReadSparseCSXIndex#50038
jmestwa-coder wants to merge 1 commit into
apache:mainfrom
jmestwa-coder:ipc-sparse-csx-index-overflow

jmestwa-coder commented May 25, 2026 •

edited by github-actions Bot

Loading

Uh oh!

kou commented May 25, 2026

Uh oh!

jmestwa-coder commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -2336,26 +2337,40 @@ Result<std::shared_ptr<SparseIndex>> ReadSparseCSXIndex(
		/allow_short_read=/false));

		std::vector<int64_t> indices_shape({non_zero_length});

Conversation

jmestwa-coder commented May 25, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kou commented May 25, 2026

Uh oh!

jmestwa-coder commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmestwa-coder commented May 25, 2026 •

edited by github-actions Bot

Loading