Skip to content

feat(rust): add configurable size guardrails (max_string_bytes, max_collection_size, max_map_size)#3421

Open
Zakir032002 wants to merge 18 commits intoapache:mainfrom
Zakir032002:issue-3409-size-guardrails
Open

feat(rust): add configurable size guardrails (max_string_bytes, max_collection_size, max_map_size)#3421
Zakir032002 wants to merge 18 commits intoapache:mainfrom
Zakir032002:issue-3409-size-guardrails

Conversation

@Zakir032002
Copy link

@Zakir032002 Zakir032002 commented Feb 25, 2026

Summary

Fixes #3409

Adds three opt-in Fory builder methods that let callers enforce upper bounds on the size of data allocated during deserialization. Without these guards a crafted payload can contain an absurdly large length prefix, causing Vec::with_capacity / string allocation to exhaust heap memory before a single byte of real data is read.

let fory = Fory::default()
    .max_string_bytes(1024 * 1024)    
    .max_collection_size(100_000)     
    .max_map_size(100_000);           

All three limits default to None (no limit), so this is 100 % backwards-compatible.

Testing

cargo fmt --check   ✓
cargo clippy --all-targets --all-features -- -D warnings   ✓
cargo test test_size_guardrails   → 6 passed, 0 failed
cargo test                        → all existing tests pass

…ollection_size, max_map_size)

Adds three Fory builder methods that let callers cap the byte length of
deserialized strings, element count of collections, and entry count of
maps. When a limit is exceeded an informative Error is returned instead
of a blind allocation, preventing OOM from crafted payloads.

- config.rs: three Option<usize> fields
- context.rs: check_string_bytes / check_collection_size / check_map_size helpers
- fory.rs: builder methods for all three limits
- buffer.rs: read_varuint36small() helper used by string check
- serializer/string.rs: check before string allocation
- serializer/collection.rs: check in generic Vec / collection read
- serializer/primitive_list.rs: check for Vec<primitive> fast path
- serializer/map.rs: check before HashMap / BTreeMap allocation
- tests/tests/test_size_guardrails.rs: 6 integration tests
- tests/tests/mod.rs: register new test module

Fixes apache#3409
- Bug 1: move check_collection_size before polymorphic dispatch in
  read_collection_data so HashSet/LinkedList/BTreeSet with polymorphic
  elements no longer bypass the limit
- Bug 2: adjust string byte budget for UTF-16 (len is code-units;
  multiply by 2 before checking) so an adversary cannot double the
  allocation limit by forcing UTF-16 encoding
- Bug 3: remove duplicate check_bound in read_latin1_string that
  regressed every latin1/ASCII string on the hot deserialization path
- Bug 4: move check_map_size before BTreeMap::new() for a consistent
  'check before allocate' pattern matching the HashMap path
@Zakir032002 Zakir032002 marked this pull request as ready for review February 25, 2026 19:35
@Zakir032002
Copy link
Author

@chaokunyang , let me know if you need any changes

@chaokunyang
Copy link
Collaborator

chaokunyang commented Feb 26, 2026

Please set default value to max value of int32. And also check length is less or equal than buffer remaining size

…g error message in read_collection_data

- read_vec_data was missing the buffer-remaining check entirely; a
  crafted len value would pass check_collection_size (default i32::MAX)
  and immediately hit Vec::with_capacity causing OOM before any read.
  Added a precise per-element byte check using size_of::<T>() since
  the static type is known at this call site.
- read_collection_data had a coarse lower-bound check (elements vs
  bytes) but the error message falsely implied a byte-level guarantee.
  Updated the message to accurately describe what the check ensures.
…a before Vec::with_capacity

Previously a crafted size_bytes value (e.g. 400_000_000) would pass
check_collection_size (default i32::MAX) and then allocate a massive
Vec before read_bytes had a chance to catch the truncation. The new
check compares size_bytes against the bytes actually remaining in the
buffer, which is precise because primitive_list encodes the payload
length in bytes — so no approximation is needed.
…_bytes

check_collection_size and check_map_size are both marked #[inline(always)]
but check_string_bytes was not, despite being called on every single string
deserialization — the hottest deserialization path in virtually every
real-world payload. Consistent with the other two guard methods.
…nd primitive_list

- test_vec_buffer_truncation_rejected: exercises the new buffer-remaining
  guard in read_vec_data (collection.rs). Crafts a valid Vec<i32> payload
  then truncates it so the declared element count cannot be satisfied.
- test_primitive_list_buffer_truncation_rejected: exercises the new guard
  in primitive_list.rs fory_read_data. Serializes a Vec<i64>, truncates
  the payload, and confirms rejection even with default (i32::MAX) limits.
- Existing test_buffer_truncation_rejected covers the string path and
  remains unchanged.
@Zakir032002 Zakir032002 force-pushed the issue-3409-size-guardrails branch from f2d745c to 4057e18 Compare February 27, 2026 06:41
@@ -315,6 +315,9 @@ pub struct ReadContext<'a> {
xlang: bool,
max_dyn_depth: u32,
check_struct_version: bool,
max_string_bytes: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify the config options, reserve:

  • max_binary_size, set default value to 64MB
  • max_collection_size, set default value to 1_000_000

@@ -472,6 +478,39 @@ impl<'a> ReadContext<'a> {
self.meta_string_resolver.read_meta_string(&mut self.reader)
}

#[inline(always)]
pub fn check_string_bytes(&self, byte_len: usize) -> Result<(), Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For string, we don't have to check bytes length, bevcause we don't have preallocaiton for string

- Add max_binary_size and max_collection_size guardrails
- Enforce collection size limits in Vec and Map deserializers
- Add buffer-remaining cross-checks to prevent pre-allocation attacks
- Simplify configuration by consolidating string and map limits
- Fixes apache#3409
@Zakir032002
Copy link
Author

Heyy @chaokunyang , Done with the changes as you requested

@@ -472,6 +476,28 @@ impl<'a> ReadContext<'a> {
self.meta_string_resolver.read_meta_string(&mut self.reader)
}

#[inline(always)]
pub fn check_binary_size(&self, byte_len: usize) -> Result<(), Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need such methods? ANd it's even not used? why you add an unused method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Rust] configurable size guardrails for untrusted payloads

2 participants