Skip to content

BigQuery metastore: implement commit_table with commit status verification#3099

Open
jx2lee wants to merge 3 commits intoapache:mainfrom
jx2lee:jx2lee/commit-table-in-bigquery
Open

BigQuery metastore: implement commit_table with commit status verification#3099
jx2lee wants to merge 3 commits intoapache:mainfrom
jx2lee:jx2lee/commit-table-in-bigquery

Conversation

@jx2lee
Copy link
Contributor

@jx2lee jx2lee commented Feb 25, 2026

Closes #2893

Rationale for this change

BigQueryMetastoreCatalog.commit_table was not implemented.

This PR implements commit_table for the BigQuery metastore catalog with:

  • create/update commit branching based on table existence
  • commit status verification on ambiguous failures
  • success detection when the committed metadata location is found in current metadata history (metadata_log)

Are these changes tested?

Yes.

uv run pytest -q tests/catalog/test_bigquery_metastore.py -q

Are there any user-facing changes?

Maybe no.

@kevinjqliu
Copy link
Contributor

@rambleraptor could you take a look at this?

Copy link
Member

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this @jx2lee, I did a quick pass!

)
current_metadata_location = parameters.get(METADATA_LOCATION_PROP)
if current_metadata_location == new_metadata_location:
return "SUCCESS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we extract the status to enum?

raise commit_error

if current_table:
self._delete_old_metadata(updated_staged_table.io, current_table.metadata, updated_staged_table.metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_delete_old_metadata is already called in table once the commit table is returned in table class.

https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1507

metadata_path=updated_staged_table.metadata_location,
)

commit_error: Exception | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: non blocking it's more of an implementation detail but with the java implementation they are preventing orphaned metadata on commit failure. This doesn't affect table state, just maybe a nice to have.

https://github.com/apache/iceberg/blob/96a59408b271881a596f74697c05adb2dbc44094/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryTableOperations.java#L116-L118

I'll let @rambleraptor chime in here.

current_metadata = FromInputFile.table_metadata(io.new_input(current_metadata_location))

previous_metadata_locations = {log.metadata_file for log in current_metadata.metadata_log}
previous_metadata_location = parameters.get(PREVIOUS_METADATA_LOCATION_PROP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check?

@rambleraptor
Copy link
Contributor

Sorry I haven't taken a look at this! I'll take a look on Monday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BigQuery catalog commit_table is not implemented

4 participants