feat(rust): add configurable size guardrails (max_string_bytes, max_collection_size, max_map_size)#3421
Open
Zakir032002 wants to merge 18 commits intoapache:mainfrom
Open
feat(rust): add configurable size guardrails (max_string_bytes, max_collection_size, max_map_size)#3421Zakir032002 wants to merge 18 commits intoapache:mainfrom
Zakir032002 wants to merge 18 commits intoapache:mainfrom
Conversation
…ollection_size, max_map_size) Adds three Fory builder methods that let callers cap the byte length of deserialized strings, element count of collections, and entry count of maps. When a limit is exceeded an informative Error is returned instead of a blind allocation, preventing OOM from crafted payloads. - config.rs: three Option<usize> fields - context.rs: check_string_bytes / check_collection_size / check_map_size helpers - fory.rs: builder methods for all three limits - buffer.rs: read_varuint36small() helper used by string check - serializer/string.rs: check before string allocation - serializer/collection.rs: check in generic Vec / collection read - serializer/primitive_list.rs: check for Vec<primitive> fast path - serializer/map.rs: check before HashMap / BTreeMap allocation - tests/tests/test_size_guardrails.rs: 6 integration tests - tests/tests/mod.rs: register new test module Fixes apache#3409
- Bug 1: move check_collection_size before polymorphic dispatch in read_collection_data so HashSet/LinkedList/BTreeSet with polymorphic elements no longer bypass the limit - Bug 2: adjust string byte budget for UTF-16 (len is code-units; multiply by 2 before checking) so an adversary cannot double the allocation limit by forcing UTF-16 encoding - Bug 3: remove duplicate check_bound in read_latin1_string that regressed every latin1/ASCII string on the hot deserialization path - Bug 4: move check_map_size before BTreeMap::new() for a consistent 'check before allocate' pattern matching the HashMap path
Author
|
@chaokunyang , let me know if you need any changes |
Collaborator
|
Please set default value to max value of int32. And also check length is less or equal than buffer remaining size |
…g error message in read_collection_data - read_vec_data was missing the buffer-remaining check entirely; a crafted len value would pass check_collection_size (default i32::MAX) and immediately hit Vec::with_capacity causing OOM before any read. Added a precise per-element byte check using size_of::<T>() since the static type is known at this call site. - read_collection_data had a coarse lower-bound check (elements vs bytes) but the error message falsely implied a byte-level guarantee. Updated the message to accurately describe what the check ensures.
…a before Vec::with_capacity Previously a crafted size_bytes value (e.g. 400_000_000) would pass check_collection_size (default i32::MAX) and then allocate a massive Vec before read_bytes had a chance to catch the truncation. The new check compares size_bytes against the bytes actually remaining in the buffer, which is precise because primitive_list encodes the payload length in bytes — so no approximation is needed.
…_bytes check_collection_size and check_map_size are both marked #[inline(always)] but check_string_bytes was not, despite being called on every single string deserialization — the hottest deserialization path in virtually every real-world payload. Consistent with the other two guard methods.
…nd primitive_list - test_vec_buffer_truncation_rejected: exercises the new buffer-remaining guard in read_vec_data (collection.rs). Crafts a valid Vec<i32> payload then truncates it so the declared element count cannot be satisfied. - test_primitive_list_buffer_truncation_rejected: exercises the new guard in primitive_list.rs fory_read_data. Serializes a Vec<i64>, truncates the payload, and confirms rejection even with default (i32::MAX) limits. - Existing test_buffer_truncation_rejected covers the string path and remains unchanged.
f2d745c to
4057e18
Compare
chaokunyang
reviewed
Feb 27, 2026
| @@ -315,6 +315,9 @@ pub struct ReadContext<'a> { | |||
| xlang: bool, | |||
| max_dyn_depth: u32, | |||
| check_struct_version: bool, | |||
| max_string_bytes: usize, | |||
Collaborator
There was a problem hiding this comment.
Let's simplify the config options, reserve:
- max_binary_size, set default value to 64MB
- max_collection_size, set default value to 1_000_000
chaokunyang
reviewed
Feb 27, 2026
| @@ -472,6 +478,39 @@ impl<'a> ReadContext<'a> { | |||
| self.meta_string_resolver.read_meta_string(&mut self.reader) | |||
| } | |||
|
|
|||
| #[inline(always)] | |||
| pub fn check_string_bytes(&self, byte_len: usize) -> Result<(), Error> { | |||
Collaborator
There was a problem hiding this comment.
For string, we don't have to check bytes length, bevcause we don't have preallocaiton for string
- Add max_binary_size and max_collection_size guardrails - Enforce collection size limits in Vec and Map deserializers - Add buffer-remaining cross-checks to prevent pre-allocation attacks - Simplify configuration by consolidating string and map limits - Fixes apache#3409
Author
|
Heyy @chaokunyang , Done with the changes as you requested |
chaokunyang
reviewed
Feb 28, 2026
| @@ -472,6 +476,28 @@ impl<'a> ReadContext<'a> { | |||
| self.meta_string_resolver.read_meta_string(&mut self.reader) | |||
| } | |||
|
|
|||
| #[inline(always)] | |||
| pub fn check_binary_size(&self, byte_len: usize) -> Result<(), Error> { | |||
Collaborator
There was a problem hiding this comment.
I don't think we need such methods? ANd it's even not used? why you add an unused method?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3409
Adds three opt-in
Forybuilder methods that let callers enforce upper bounds on the size of data allocated during deserialization. Without these guards a crafted payload can contain an absurdly large length prefix, causingVec::with_capacity/ string allocation to exhaust heap memory before a single byte of real data is read.All three limits default to
None(no limit), so this is 100 % backwards-compatible.Testing