AVRO-4242: [Java] Fix NPE in DataFileStream and DataFileReader when schema metadata is missing#3726
Open
iemejia wants to merge 1 commit intoapache:mainfrom
Open
Conversation
…chema metadata is missing Malformed Avro container files without the 'avro.schema' metadata entry caused a NullPointerException in both DataFileStream and DataFileReader12 when the null value was passed directly to Schema.Parser.parse(). Replace inline parsing with null-safe helper methods that throw a descriptive IOException instead.
3adb3a0 to
893d488
Compare
There was a problem hiding this comment.
Pull request overview
Fixes crashes when reading malformed Avro container files that omit schema metadata by introducing null-safe schema parsing that fails with a descriptive IOException instead of an NPE.
Changes:
- Replace inline schema parsing in
DataFileStreamwith a null-safe helper that throws descriptiveIOExceptions for missing/invalid schema metadata. - Reuse the same helper in
DataFileReader12to avoid NPEs when the 1.2-formatschemametadata is absent. - Add a regression test that builds a malformed container header (missing
avro.schema) and assertsDataFileStreamandDataFileReaderfail with a helpful message.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| lang/java/avro/src/main/java/org/apache/avro/file/DataFileStream.java | Adds null-safe schema parsing from metadata and uses it during header initialization. |
| lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader12.java | Uses the new helper to avoid NPE when schema metadata is missing in 1.2 format. |
| lang/java/avro/src/test/java/org/apache/avro/TestDataFileReader.java | Adds a regression test building a malformed header missing schema metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+271
to
+273
| encoder.writeString(DataFileConstants.CODEC); | ||
| encoder.writeBytes("null".getBytes()); | ||
| encoder.writeMapEnd(); |
| } | ||
|
|
||
| private Schema parseSchema() throws IOException { | ||
| return DataFileStream.parseSchemaFromMetadata(getMetaString(SCHEMA), SCHEMA, new Schema.Parser()); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Malformed Avro container files without the 'avro.schema' metadata entry caused a NullPointerException in both DataFileStream and DataFileReader12 when the null value was passed directly to Schema.Parser.parse(). Replace inline parsing with null-safe helper methods that throw a descriptive IOException instead.
R: @RyanSkraba