validate protobuf message index bounds on deserialize#2271
Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
|
/sem-approve |
There was a problem hiding this comment.
Pull request overview
This pull request hardens Protobuf deserialization against crafted Confluent wire-framed payloads by validating message-index bounds before indexing into FileDescriptorProto / DescriptorProto, preventing negative-index wraparound and replacing bare IndexError with a consistent SerializationError.
Changes:
- Add explicit bounds checks for top-level and nested message indexes in
_get_message_desc_proto(sync + async), raisingSerializationErroron invalid indexes. - Add unit tests (sync + async) that construct a minimal
FileDescriptorProtoand verify both in-range resolution and out-of-range failures.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
tests/schema_registry/_sync/test_proto.py |
Adds sync tests covering valid and invalid message-index arrays, asserting SerializationError for out-of-range indexes. |
tests/schema_registry/_async/test_proto.py |
Adds async tests mirroring sync coverage for message-index bounds validation. |
src/confluent_kafka/schema_registry/_sync/protobuf.py |
Validates top-level and nested message index bounds in _get_message_desc_proto and raises SerializationError when invalid. |
src/confluent_kafka/schema_registry/_async/protobuf.py |
Same bounds validation and SerializationError behavior for the async implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The build-test-release failure looks unrelated to this change. The diff is just the bounds check in Pushed an empty commit to re-trigger the build. The new push resets the external CI gate, so it'll need another /sem-approve to run. |
What
The protobuf deserializer picks the writer message type by walking the message-index array from the Confluent wire framing into the parsed
FileDescriptorProto.SchemaId._read_index_arraycaps the array length but not the individual index values, and those values are zigzag varints, so a crafted message can carry a negative or out-of-range index._get_message_desc_protothen indexesdesc.message_type/desc.nested_typewith the value directly.Before: a negative index (for example uvarint
0x01, which zigzag-decodes to-1) wraps around and resolves to a different message type in the same file, so the payload is decoded against the wrong descriptor. An out-of-range positive index raises a bareIndexError.After: an index outside
0 <= index < len(...)raisesSerializationErrorwith the index and the available count, matching the bound check the confluent-kafka-go reference (toMessageDesc) already applies. Valid indexes behave the same as today.Tradeoff: a message that previously decoded against an unintended message type now fails fast instead of returning a mis-typed object. That is the point of the change, but it does convert a silent wrong-result into a surfaced error.
Checklist
References
JIRA:
Test & Review
Added
test_message_index_in_rangeandtest_message_index_out_of_rangetotests/schema_registry/_async/test_proto.py(and the generated_synccounterpart). The out-of-range cases[-1],[2],[0, -1],[0, 5]reproduce the issue on master and pass after the change._syncfiles were regenerated withtools/unasync.py.Open questions / Follow-ups
None.