Skip to content

GH-49716: [C++] FixedShapeTensorType::Deserialize should strictly validate serialized metadata#49718

Merged
raulcd merged 3 commits intoapache:mainfrom
rok:gh-49716-fst-deserialize
Apr 14, 2026
Merged

GH-49716: [C++] FixedShapeTensorType::Deserialize should strictly validate serialized metadata#49718
raulcd merged 3 commits intoapache:mainfrom
rok:gh-49716-fst-deserialize

Conversation

@rok
Copy link
Copy Markdown
Member

@rok rok commented Apr 13, 2026

Rationale for this change

FixedShapeTensorType::Deserialize should validate input from unknown sources.

What changes are included in this PR?

Adds stricter deserialization valideation.

Are these changes tested?

Yes. New tests are added.

Are there any user-facing changes?

Stricter validation should not be noticed if metadata is correct as per spec of fixed_shape_tensor.

@rok
Copy link
Copy Markdown
Member Author

rok commented Apr 13, 2026

@pitrou

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Apr 13, 2026

I suppose VariableShapeTensorType must be similarily improved, right?

@pitrou pitrou added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Apr 13, 2026
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Apr 13, 2026

@raulcd I think this should be part of 24.0.0.

@rok
Copy link
Copy Markdown
Member Author

rok commented Apr 13, 2026

I suppose VariableShapeTensorType must be similarily improved, right?

shape is not stored in metadata of VariableShapeTensorType but rather in a field of the struct array (which is storage of variable shape tensor) and could be created invalid. But that wouldn't be checked at deserialization time typically. Is this a check we should do at deserialization time?

We do have uniform_shape which we currently parse and partly validate, but don't really validate. The idea was to potentially use it for optimizations later on/

I added a validation step for permutation.

@rok
Copy link
Copy Markdown
Member Author

rok commented Apr 13, 2026

@pitrou I pushed requested changes, please do another pass.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Apr 13, 2026
for (auto& x : document["shape"].GetArray()) {
for (const auto& x : document["shape"].GetArray()) {
if (!x.IsInt64()) {
return Status::Invalid("shape must contain integers");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show the actual type for easy to debug on failure?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Now we emit value and type of the value on deserialization failure.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 14, 2026
@rok rok force-pushed the gh-49716-fst-deserialize branch from 3395fec to b0dcdb7 Compare April 14, 2026 09:29
@rok
Copy link
Copy Markdown
Member Author

rok commented Apr 14, 2026

ping @pitrou

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 14, 2026
Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @rok !

@pitrou pitrou added CI: Extra: C++ Run extra C++ CI CI: Extra: R Run extra R CI labels Apr 14, 2026
@pitrou pitrou added this to the 24.0.0 milestone Apr 14, 2026
@raulcd raulcd added CI: Extra: R Run extra R CI and removed CI: Extra: R Run extra R CI labels Apr 14, 2026
@raulcd
Copy link
Copy Markdown
Member

raulcd commented Apr 14, 2026

I've re-added the R Extra label because it "failed" to run the check-labels workflow. I think there might be a race condition if several labels are applied at the same time and sometimes workflows are cancelled.

@rok
Copy link
Copy Markdown
Member Author

rok commented Apr 14, 2026

I've restarted a failing R check, others seem to be passing ok.
Edit: Rhub GCC 13 with LTO failed.

@raulcd
Copy link
Copy Markdown
Member

raulcd commented Apr 14, 2026

That failure is unrelated. Has been failing for some weeks and was tracked in: #49737

@rok
Copy link
Copy Markdown
Member Author

rok commented Apr 14, 2026

@raulcd shall I rebase and merge if green? Or just merge now?

Copy link
Copy Markdown
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this has been approved by @pitrou and LGTM, I'll just merge and cherry-pick for 24.0.0. Thanks @rok

@raulcd raulcd merged commit 41035d4 into apache:main Apr 14, 2026
124 of 156 checks passed
@raulcd raulcd removed the awaiting changes Awaiting changes label Apr 14, 2026
@github-actions github-actions bot added the awaiting merge Awaiting merge label Apr 14, 2026
raulcd pushed a commit that referenced this pull request Apr 14, 2026
…idate serialized metadata (#49718)

### Rationale for this change

FixedShapeTensorType::Deserialize should validate input from unknown sources.

### What changes are included in this PR?

Adds stricter deserialization valideation.

### Are these changes tested?

Yes. New tests are added.

### Are there any user-facing changes?

Stricter validation should not be noticed if metadata is correct as per spec of fixed_shape_tensor.
* GitHub Issue: #49716

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 41035d4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 32 possible false positives for unstable benchmarks that are known to sometimes produce them.

@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 41035d4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 32 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting merge Awaiting merge CI: Extra: C++ Run extra C++ CI CI: Extra: R Run extra R CI Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants