Skip to content

Fix DELETED manifest entry snapshot_id in OverwriteFiles#3237

Open
lawofcycles wants to merge 1 commit intoapache:mainfrom
lawofcycles:fix/deleted-entry-snapshot-id
Open

Fix DELETED manifest entry snapshot_id in OverwriteFiles#3237
lawofcycles wants to merge 1 commit intoapache:mainfrom
lawofcycles:fix/deleted-entry-snapshot-id

Conversation

@lawofcycles
Copy link
Copy Markdown

@lawofcycles lawofcycles commented Apr 13, 2026

Rationale for this change

When _OverwriteFiles._deleted_entries() creates DELETED manifest entries, it now sets snapshot_id to the current (deleting) snapshot's ID instead of retaining the original INSERT snapshot's ID.

Closes #3236

According to the Iceberg spec (Manifest Entry Fields), snapshot_id for a DELETED entry (status=2) should be the snapshot ID in which the file was deleted. However, _OverwriteFiles._deleted_entries() was copying the original entry's snapshot_id (from the INSERT snapshot) into the new DELETED entry.

This caused downstream consumers that filter manifest entries by snapshot_id (e.g. Iceberg Java's IncrementalChangelogScan) to silently miss DELETED files, breaking CDC pipelines.

Are these changes tested?

Added test_manifest_entry_snapshot_id_after_partial_deletes in tests/integration/test_deletes.py.

Are there any user-facing changes?

N/A

When _OverwriteFiles._deleted_entries() creates DELETED manifest entries,
it now sets snapshot_id to the current (deleting) snapshot's ID instead of
retaining the original INSERT snapshot's ID.

This aligns with the Iceberg spec which states that for DELETED entries,
snapshot_id should be the snapshot in which the file was deleted.

The existing _DeleteFiles._compute_deletes() already handled this correctly.

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DELETED manifest entries retain original snapshot_id instead of the deleting snapshot's ID

1 participant