Skip to content

MINOR: [C++][CSV] avoid int32 overflow in block parser value counts#50074

Open
metsw24-max wants to merge 1 commit into
apache:mainfrom
metsw24-max:csv-block-parser-int32-overflow
Open

MINOR: [C++][CSV] avoid int32 overflow in block parser value counts#50074
metsw24-max wants to merge 1 commit into
apache:mainfrom
metsw24-max:csv-block-parser-int32-overflow

Conversation

@metsw24-max
Copy link
Copy Markdown
Contributor

Rationale for this change

The CSV block parser sizes its per-chunk value array from num_cols, the column count inferred from the first line of the input, times the rows-in-chunk count. PresizedValueDescWriter computes 2 + num_rows * num_cols, and ParseSpecialized computes num_cols_ * (num_rows_ - start) * 10, both in int32_t. A CSV whose first line carries a few million fields pushes these products past INT32_MAX, which is signed-integer-overflow UB (UBSan flags both expressions).

What changes are included in this PR?

Widen both multiplications to int64_t, matching their int64_t destinations.

Are these changes tested?

Existing CSV parser tests pass. The overflow was confirmed with a standalone UBSan build of the two expressions, clean after widening.

Are there any user-facing changes?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant