Skip to content

Iceberg: correctly retry commit on conflict#35428

Open
DAlperin wants to merge 1 commit intoMaterializeInc:mainfrom
DAlperin:dov/iceberg-retry-commit
Open

Iceberg: correctly retry commit on conflict#35428
DAlperin wants to merge 1 commit intoMaterializeInc:mainfrom
DAlperin:dov/iceberg-retry-commit

Conversation

@DAlperin
Copy link
Member

@DAlperin DAlperin commented Mar 11, 2026

Fixes #11229. Paired with #35397 for tests.

@DAlperin DAlperin requested a review from a team as a code owner March 11, 2026 13:55
@cursor
Copy link

cursor bot commented Mar 11, 2026

PR Summary

Medium Risk
Changes Iceberg sink commit/retry behavior and fencing checks, which can affect exactly-once progress and data-loss prevention if incorrect. Scope is localized but touches core commit path and conflict handling.

Overview
Fixes Iceberg sink commit retries to correctly rebuild and commit transactions against the latest table metadata when conflicts occur.

The commit path now retries with carried-forward table state (retry_async_with_state), reloads the table on both commit conflicts and failed apply, and adds an additional fencing check that aborts if a newer mz-sink-version is detected while retrying.

Written by Cursor Bugbot for commit 940e235. This will update automatically on new commits. Configure here.

@github-actions
Copy link

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@DAlperin DAlperin requested a review from def- March 11, 2026 13:55
@def-
Copy link
Contributor

def- commented Mar 11, 2026

Thanks, rebased the test on your PR: https://buildkite.com/materialize/nightly/builds/15550
Edit: Green!

@DAlperin DAlperin force-pushed the dov/iceberg-retry-commit branch from eb61797 to 98e2b74 Compare March 11, 2026 15:23
@DAlperin DAlperin requested a review from a team March 12, 2026 18:11
@DAlperin
Copy link
Member Author

bugbot run

e
)));
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated error message in apply failure path

Low Severity

The action.apply(tx) result is first wrapped with .context("Failed to apply data file addition to iceberg table transaction"), then in the Err branch, the resulting error e is formatted into a new anyhow! with the exact same prefix message. Since Display on the anyhow::Error shows the outermost context, the final error message will read "Failed to apply data file addition to iceberg table transaction: Failed to apply data file addition to iceberg table transaction: <original error>". Either the .context() call or the custom anyhow! message is redundant.

Fix in Cursor Fix in Web

let frontier_json = serde_json::to_string(&frontier.elements())
.context("Failed to serialize frontier to JSON")?;
#[allow(clippy::disallowed_types)]
let snapshot_properties: std::collections::HashMap<String, String> = vec![
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a wrapper type in ore!

@DAlperin DAlperin force-pushed the dov/iceberg-retry-commit branch from 98e2b74 to 7f21f11 Compare March 13, 2026 18:18
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

@DAlperin DAlperin force-pushed the dov/iceberg-retry-commit branch from 7f21f11 to 940e235 Compare March 13, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants