feat(python): add configurable size guardrails#3429
feat(python): add configurable size guardrails#3429eyad-hazem-elmorsy wants to merge 3 commits intoapache:mainfrom
Conversation
This commit: - provides configurable size guardrails for untrusted payloads (lists, sets, tuples, maps, and strings) - tests size guardrails
python/pyfory/_fory.py
Outdated
| "policy", | ||
| "max_collection_length", | ||
| "max_map_length", | ||
| "max_string_bytes_length", |
There was a problem hiding this comment.
could we rename max_string_bytes_length to max_binary_size, and reame other _length to _size for consistency?
There was a problem hiding this comment.
And we could use max_collection_size for map size currently to reduce config option overhead.
python/pyfory/_fory.py
Outdated
|
|
||
| cls = serializer.type_ if hasattr(serializer, "type_") else None | ||
| if cls is str: | ||
| return buffer.read_string(self.max_string_bytes_length) |
There was a problem hiding this comment.
We don't need to validate string size, because the valide is only used to prevent malious data to fake a hught size to cause a big allocation which incur oom. but for string, underlien buffer doesn't haev that much buffer, the read will jsut fail, there is no OOM issue
python/pyfory/_fory.py
Outdated
| policy: DeserializationPolicy = None, | ||
| field_nullable: bool = False, | ||
| meta_compressor=None, | ||
| max_collection_length: int = -1, |
python/pyfory/_fory.py
Outdated
| meta_compressor=None, | ||
| max_collection_length: int = -1, | ||
| max_map_length: int = -1, | ||
| max_string_bytes_length: int = -1, |
There was a problem hiding this comment.
set to 64 * 1024 * 1024
- Remove enforcing max_string_length. - Use max_collection_size for size limits including maps(dicts). - Modify tests.
|
You can take a look @chaokunyang |
python/pyfory/buffer.pyx
Outdated
| self.c_buffer.write_bytes(&data[0], length) | ||
|
|
||
| cpdef inline bytes read_bytes_and_size(self): | ||
| cpdef inline bytes read_bytes_and_size(self, int32_t max_binary_size=-1): |
There was a problem hiding this comment.
THis method doesn't have preallocation, we don't need to check size
There was a problem hiding this comment.
Since we want to enforce max_binary_size for binary byte-length reads (as mentioned in issue), should I move the size checking to read_bytes() method before this line?
cdef bytes py_bytes = PyBytes_FromStringAndSize(NULL, length)
There was a problem hiding this comment.
So, we don't want to enforce binary byte-length reads, we only wanto to avoid malicouse length to introuce OOM. So if there is no preallocation, there should not be no such check. Because if no engought data to read from buffer, buffer read PAI will raise exception
There was a problem hiding this comment.
Please remove max_binary_size parameter to keep API surface minimal
There was a problem hiding this comment.
You can even add comments to clarify that those options are used for preallocation checks only
There was a problem hiding this comment.
Looks good, but I want to ask about the line I mentioned before:
cdef bytes py_bytes = PyBytes_FromStringAndSize(NULL, length) in buffer read_bytes function
I searched and found that this call may result OOM crash or Memory Error before any checks.
So we can check bounds or ensure readability first. What do you see?
This commit:
Why?
There are currently no configurable limits for payload-driven lengths. Untrusted string/map/list lengths can trigger large allocations and memory pressure.
What does this PR do?
Related issues