Skip to content

AVRO-4242: [Java] Fix NPE in DataFileStream and DataFileReader when schema metadata is missing#3726

Open
iemejia wants to merge 1 commit intoapache:mainfrom
iemejia:AVRO-4242-mlaformed-avro-container-no-schema-npe
Open

AVRO-4242: [Java] Fix NPE in DataFileStream and DataFileReader when schema metadata is missing#3726
iemejia wants to merge 1 commit intoapache:mainfrom
iemejia:AVRO-4242-mlaformed-avro-container-no-schema-npe

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented Apr 6, 2026

Malformed Avro container files without the 'avro.schema' metadata entry caused a NullPointerException in both DataFileStream and DataFileReader12 when the null value was passed directly to Schema.Parser.parse(). Replace inline parsing with null-safe helper methods that throw a descriptive IOException instead.

R: @RyanSkraba

@github-actions github-actions bot added the Java Pull Requests for Java binding label Apr 6, 2026
@iemejia iemejia changed the title AVRO-4242: [Java] Fix NPE in DataFileStream and DataFileReader when s… AVRO-4242: [Java] Fix NPE in DataFileStream and DataFileReader when schema metadata is missing Apr 8, 2026
@iemejia iemejia requested a review from RyanSkraba April 8, 2026 13:44
…chema metadata is missing

Malformed Avro container files without the 'avro.schema' metadata entry
caused a NullPointerException in both DataFileStream and DataFileReader12
when the null value was passed directly to Schema.Parser.parse(). Replace
inline parsing with null-safe helper methods that throw a descriptive
IOException instead.
@iemejia iemejia force-pushed the AVRO-4242-mlaformed-avro-container-no-schema-npe branch from 3adb3a0 to 893d488 Compare April 8, 2026 16:29
@iemejia iemejia requested a review from Copilot April 10, 2026 19:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes crashes when reading malformed Avro container files that omit schema metadata by introducing null-safe schema parsing that fails with a descriptive IOException instead of an NPE.

Changes:

  • Replace inline schema parsing in DataFileStream with a null-safe helper that throws descriptive IOExceptions for missing/invalid schema metadata.
  • Reuse the same helper in DataFileReader12 to avoid NPEs when the 1.2-format schema metadata is absent.
  • Add a regression test that builds a malformed container header (missing avro.schema) and asserts DataFileStream and DataFileReader fail with a helpful message.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
lang/java/avro/src/main/java/org/apache/avro/file/DataFileStream.java Adds null-safe schema parsing from metadata and uses it during header initialization.
lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader12.java Uses the new helper to avoid NPE when schema metadata is missing in 1.2 format.
lang/java/avro/src/test/java/org/apache/avro/TestDataFileReader.java Adds a regression test building a malformed header missing schema metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +271 to +273
encoder.writeString(DataFileConstants.CODEC);
encoder.writeBytes("null".getBytes());
encoder.writeMapEnd();
}

private Schema parseSchema() throws IOException {
return DataFileStream.parseSchemaFromMetadata(getMetaString(SCHEMA), SCHEMA, new Schema.Parser());
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java Pull Requests for Java binding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants