GH-36889: [C++][Python] Fix duplicate CSV header when first batch is empty #48718
+77
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Fixes #36889
When writing CSV from a table where the first batch is empty, the header gets written twice:
What changes are included in this PR?
The bug happens because:
data_buffer_and flushed duringCSVWriterImplinitializationTranslateMinimalBatchreturns early for empty batches without modifyingdata_buffer_WriteTable/WriteRecordBatchloop then writesdata_buffer_which still contains the stale headerThe fix clears the buffer (resize to 0) when encountering an empty batch in
TranslateMinimalBatch, so the subsequent write outputs nothing.Are these changes tested?
Yes. Added C++ tests in
writer_test.ccand Python tests intest_csv.py:Are there any user-facing changes?
No API changes. This is a bug fix that prevents duplicate headers when writing CSV from tables with empty batches.