Skip to content

Replace Docker-based Kafka with embedded KRaft broker for integration tests#17790

Merged
xiangfu0 merged 1 commit intoapache:masterfrom
xiangfu0:flaky-kafka-startable
Mar 8, 2026
Merged

Replace Docker-based Kafka with embedded KRaft broker for integration tests#17790
xiangfu0 merged 1 commit intoapache:masterfrom
xiangfu0:flaky-kafka-startable

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Mar 1, 2026

Summary

  • Replace KafkaServerStartable (Docker CLI-based) with EmbeddedKafkaCluster using Kafka's KafkaClusterTestKit in KRaft mode for integration tests
  • Eliminates Docker dependency for tests, improving startup speed (~5s vs ~30-60s) and removing the primary source of CI flakiness on GitHub Actions (image pulls, network creation races, port mapping issues, container timeouts)
  • Simplify BaseClusterIntegrationTest Kafka lifecycle — no retry loop, no Docker container management, no port scanning

Changes

  • New: EmbeddedKafkaCluster — in-process KRaft cluster wrapper implementing StreamDataServerStartable, supporting multi-broker clusters and Kafka transactions
  • Modified: BaseClusterIntegrationTest — rewrote Kafka start/stop to use embedded cluster; fixed replication factor calculation for multi-broker clusters; removed Docker-specific dead code (~170 lines)
  • Modified: KafkaPartitionLevelConsumerTest — use EmbeddedKafkaCluster instead of Docker
  • Modified: ExactlyOnceKafkaRealtimeClusterIntegrationTest — rewritten to use single KafkaProducer for both abort and commit transactions, avoiding transaction marker race conditions; added post-commit verification with System.err diagnostics
  • Modified: PartialUpsertTableRebalanceIntegrationTest — added retry logic for embedded Kafka topic metadata propagation delays
  • Modified: EmbeddedKafkaCluster — added essential broker configs for transactions (transaction.state.log.num.partitions=1, log.flush.interval.messages=1, etc.); use MetadataVersion.latestProduction()
  • Modified: pom.xml (root, pinot-kafka-3.0, pinot-integration-test-base, pinot-integration-tests) — added kafka test-jar dependencies for embedded broker support
  • Unchanged: KafkaServerStartable (production Docker-based class for QuickStart) — not modified

Test plan

  • KafkaPartitionLevelConsumerTest — 9 tests pass (single broker)
  • LLCRealtimeClusterIntegrationTest — passes (2 brokers, non-transactional)
  • ExactlyOnceKafkaRealtimeClusterIntegrationTest — passes (3 brokers, Kafka transactions with read_committed)
  • PartialUpsertTableRebalanceIntegrationTest — passes with embedded Kafka
  • Full integration test suite via CI

🤖 Generated with Claude Code

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.18%. Comparing base (4434eda) to head (a240775).
⚠️ Report is 18 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17790      +/-   ##
============================================
- Coverage     63.23%   63.18%   -0.05%     
  Complexity     1456     1456              
============================================
  Files          3187     3188       +1     
  Lines        191620   191736     +116     
  Branches      29315    29347      +32     
============================================
- Hits         121172   121156      -16     
- Misses        60968    61092     +124     
- Partials       9480     9488       +8     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (?)
java-11 63.16% <ø> (-0.05%) ⬇️
java-21 63.16% <ø> (-0.05%) ⬇️
temurin 63.18% <ø> (-0.05%) ⬇️
unittests 63.18% <ø> (-0.05%) ⬇️
unittests1 55.57% <ø> (-0.05%) ⬇️
unittests2 34.11% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the flaky-kafka-startable branch 6 times, most recently from ba33bd3 to 1fcba20 Compare March 2, 2026 04:46
@xiangfu0 xiangfu0 requested a review from Copilot March 2, 2026 05:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes Pinot’s Kafka-backed integration testing by replacing Docker-managed Kafka startup with an in-process Kafka KRaft cluster (Kafka KafkaClusterTestKit), aiming to reduce CI flakiness and improve test startup time.

Changes:

  • Introduces EmbeddedKafkaCluster (KRaft, in-process) implementing StreamDataServerStartable for integration tests.
  • Refactors Kafka lifecycle in BaseClusterIntegrationTest and updates Kafka-based tests to use the embedded broker.
  • Updates Maven + GitHub Actions integration test partitioning to run ExactlyOnceKafkaRealtimeClusterIntegrationTest in its own test set with required Kafka test dependencies.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pom.xml Adds dependency management entries for Kafka test artifacts, Pinot Kafka test-jar, and JUnit Jupiter API needed by Kafka testkit.
pinot-plugins/pinot-stream-ingestion/pinot-kafka-3.0/src/test/java/org/apache/pinot/plugin/stream/kafka30/server/EmbeddedKafkaCluster.java New embedded Kafka (KRaft) cluster wrapper for test usage.
pinot-plugins/pinot-stream-ingestion/pinot-kafka-3.0/src/test/java/org/apache/pinot/plugin/stream/kafka30/KafkaPartitionLevelConsumerTest.java Switches test Kafka setup from Docker-based starter to embedded cluster.
pinot-plugins/pinot-stream-ingestion/pinot-kafka-3.0/pom.xml Adds Kafka testkit + JUnit Jupiter API dependencies required to compile/run embedded broker tests.
pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTest.java Replaces Docker Kafka startup with embedded cluster and adjusts topic readiness/replication handling.
pinot-integration-test-base/pom.xml Adds test dependencies on pinot-kafka-3.0 test-jar and Kafka test artifacts.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/PartialUpsertTableRebalanceIntegrationTest.java Updates cleanup strategy (drop/recreate table + recreate topic) and adjusts lease extender executor handling after stopping servers.
pinot-integration-tests/pom.xml Adds a new integration-tests-set-3 profile for ExactlyOnce test and pulls in embedded Kafka dependencies.
.github/workflows/scripts/pr-tests/.pinot_tests_integration.sh Adds support for running integration test set 3.
.github/workflows/pinot_tests.yml Expands integration test matrix to include test set 3.

@xiangfu0 xiangfu0 force-pushed the flaky-kafka-startable branch from 655816f to d3f883b Compare March 2, 2026 21:44
@Jackie-Jiang Jackie-Jiang added enhancement Improvement to existing functionality kafka Related to Kafka stream connector testing Related to tests or test infrastructure labels Mar 3, 2026
@xiangfu0 xiangfu0 force-pushed the flaky-kafka-startable branch 4 times, most recently from c09cdc7 to 7ac2a59 Compare March 4, 2026 21:06
@xiangfu0 xiangfu0 requested a review from Copilot March 4, 2026 21:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

… tests

- Replace KafkaServerStartable (Docker CLI-based) with EmbeddedKafkaCluster
  using Kafka's KafkaClusterTestKit in KRaft mode for integration tests
- Eliminate Docker dependency, improving startup speed and removing CI flakiness
- Add essential embedded Kafka broker configs for transactions
- Rewrite ExactlyOnce test to use single KafkaProducer for both abort and
  commit transactions, avoiding transaction marker race conditions
- Fix PartialUpsertTableRebalanceIntegrationTest with retry logic for
  embedded Kafka topic metadata propagation delays

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xiangfu0 xiangfu0 force-pushed the flaky-kafka-startable branch from 7ac2a59 to a240775 Compare March 4, 2026 23:41
@xiangfu0 xiangfu0 merged commit b280619 into apache:master Mar 8, 2026
34 checks passed
@xiangfu0 xiangfu0 deleted the flaky-kafka-startable branch March 8, 2026 07:22
@xiangfu0 xiangfu0 added the docker Related to Docker images or containerization label Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docker Related to Docker images or containerization enhancement Improvement to existing functionality kafka Related to Kafka stream connector testing Related to tests or test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants