Skip to content

[ams][test] Make RestCatalogService nested tests order-independent#4201

Open
lintingbin wants to merge 1 commit into
apache:masterfrom
lintingbin:fix-rest-catalog-test-isolation
Open

[ams][test] Make RestCatalogService nested tests order-independent#4201
lintingbin wants to merge 1 commit into
apache:masterfrom
lintingbin:fix-rest-catalog-test-isolation

Conversation

@lintingbin
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

amoro-ams/src/test/java/org/apache/amoro/server/RestCatalogServiceTestBase and its subclasses (TestInternalMixedCatalogService, TestInternalIcebergCatalogService) share a single static AmsEnvironment and a hard-coded database = \"test_ns\" / table = \"test_iceberg_tbl\" across every @Nested class. Whenever cleanup leaks (the existing @AfterEach swallows dropDatabase exceptions) or background catalog scans race the tests, sibling nested classes inherit dirty catalog state and assertions that hard-code the catalog being empty start failing.

This PR makes the tests order-independent without restructuring the suite:

  1. Unique database name per test instanceRestCatalogServiceTestBase.database is now \"test_ns_\" + COUNTER.incrementAndGet(). Each test method gets a fresh test instance (JUnit 5 default Lifecycle.PER_METHOD), so each test gets a fresh database name. Pollution between tests can no longer cause AlreadyExists on createDatabase or pre-populated catalog snapshots.
  2. Delta-based catalog/namespace assertions in TestInternalMixedCatalogService.TestDatabaseOperation.test and TestInternalIcebergCatalogService.NamespaceTests.testNamespaceOperations. The previous code asserted catalog.listDatabases().isEmpty() which only holds in a single-test-class-per-JVM execution.

Why are the changes needed?

These tests have been flaky on master for a while — for example, the Core/hadoop3 CI run on 2026-04-23 failed inside TestInternalIcebergCatalogService\$CatalogPropertiesTest's suite, and PR #4200's CI just hit the same family of failures, all surfacing the same root cause:

TestInternalMixedCatalogService\$TestDatabaseOperation.test:168
    expected: <true> but was: <false>          // listDatabases().isEmpty() failed because a previous test leaked test_ns

TestInternalMixedCatalogService\$TestTableOperation.before:186
    AlreadyExists Database: test_ns already exists  // previous @AfterEach swallowed dropDatabase failure

TestInternalMixedCatalogService\$TestTableCommit.testTableCommit:295
    NoSuchTableException: ...test_iceberg_tbl not exists  // sibling test's @AfterEach.dropTable raced into our just-created table

The shared test_ns / test_iceberg_tbl names are the load-bearing reason these races collide. With unique per-instance database names, every test gets its own namespace, so concurrent or leaked cleanup from a sibling test no longer touches it.

This is complementary to #4185, which fixed a Version-N-already-exists race in InternalMixedIcebergHandler.newTableOperations triggered by the same shared-state pattern.

How was this patch tested?

Ran the affected suites locally on Java 11:

./mvnw test -pl amoro-ams -am -Pspark-3.5,support-all-formats,openapi-sdk \\
  -Dtest='TestInternalMixedCatalogService,TestInternalIcebergCatalogService'
# Tests run: 16, Failures: 0, Errors: 0, Skipped: 0

./mvnw spotless:check checkstyle:check -pl amoro-ams -am is clean.

Does this PR introduce any user-facing change?

No — test-only change.

The integration tests under
amoro-ams/src/test/java/org/apache/amoro/server/TestInternal*CatalogService
share a static AmsEnvironment instance and a hard-coded "test_ns"
database / "test_iceberg_tbl" table name across every @nested class.
Whenever cleanup leaks (the existing @AfterEach swallows dropDatabase
exceptions) or background scans race the tests, sibling @nested
classes inherit dirty catalog state and assertions that hard-code the
catalog being empty start failing.

Concrete fixes:

* Each test instance now picks a fresh database name via a static
  AtomicLong counter ("test_ns_<n>"), so unrelated tests can never
  collide on the same namespace even if a previous run leaked state.
* TestInternalMixedCatalogService.TestDatabaseOperation.test now
  asserts the create / drop delta against the initial database and
  namespace counts instead of asserting the catalog is empty.
* TestInternalIcebergCatalogService.NamespaceTests.testNamespaceOperations
  has the same delta-based fix.

This addresses the same pre-existing flakiness seen on master CI
(see Core/hadoop3 run on 2026-04-23 failing in
TestInternalIcebergCatalogService$CatalogPropertiesTest's suite) and
the failures reproduced on PR apache#4200 CI in
TestInternalMixedCatalogService$TestDatabaseOperation /
TestTableOperation / TestTableCommit.
@github-actions github-actions Bot added the module:ams-server Ams server module label May 1, 2026
@lintingbin
Copy link
Copy Markdown
Contributor Author

Hi @zhoujinsong, would you mind taking a look when you have time? Thanks!

This is test-only and is meant to fix the flaky TestInternalMixedCatalogService / TestInternalIcebergCatalogService failures that have been intermittently breaking CI on master and on #4200. Complementary to your #4185 — same test suite, but a different (test-isolation) flavor of the same shared-state pattern.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@amoro.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant