Skip to content

GH-3307: Replace deprecated BIT_PACKED with RLE in DevNullValuesWriter#3446

Open
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:GH-3307
Open

GH-3307: Replace deprecated BIT_PACKED with RLE in DevNullValuesWriter#3446
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:GH-3307

Conversation

@LuciferYang
Copy link
Contributor

Rationale for this change

Fixes #3307

When repetition/definition levels are empty (i.e., maxLevel == 0 for required, non-repeated fields), DevNullValuesWriter is used as a no-op writer that produces zero bytes. However, its getEncoding() method returns the deprecated BIT_PACKED encoding, which gets written into the DataPageHeader metadata (repetition_level_encoding / definition_level_encoding) and the column chunk encoding list.

Since parquet-java already uses RLE as the encoding for levels, the metadata should reflect RLE rather than the deprecated BIT_PACKED.

What changes are included in this PR?

Changed DevNullValuesWriter.getEncoding() to return Encoding.RLE instead of Encoding.BIT_PACKED.

Are these changes tested?

Yes. All existing tests in parquet-column and parquet-hadoop pass without modification.

Are there any user-facing changes?

Newly written Parquet files will report RLE instead of BIT_PACKED in page header metadata for empty repetition/definition levels. This has no impact on file compatibility — readers do not decode level data when the byte length is zero, regardless of the encoding value in the header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BIT_PACKING is written by default when Definition or Repetition levels are empty

1 participant