ref: Remove Kafka/Arroyo consumer code#596
Conversation
Launchpad has fully migrated to TaskWorker mode for processing artifacts. Remove all Kafka/Arroyo consumer infrastructure since it is no longer in use. Deleted: - kafka.py (consumer, strategies, config) - service.py (Kafka+HTTP orchestrator) - arroyo_metrics.py (Arroyo metrics backend) - ensure_kafka_topics.py and test_kafka.py scripts - Kafka integration and E2E tests Updated: - CLI: removed `serve` command, worker is the only mode - Dockerfile: default CMD changed from serve to worker - CI: removed Kafka from test job, removed e2e job - devservices: removed Kafka dependency - Relocated ServiceConfig to artifact_processor.py - Removed sentry-kafka-schemas and kafka-python deps Note: sentry-arroyo and confluent-kafka remain as deps because worker/app.py still uses KafkaProducer (to be cleaned up separately). Co-Authored-By: Claude <noreply@anthropic.com>
Sentry Build Distribution
Configure launchpad-test-android build distribution settings |
The LaunchpadServer was only used by the Kafka/Arroyo service orchestrator. With TaskWorker mode, no HTTP server is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| -e LAUNCHPAD_PORT=2218 \ | ||
| launchpad-test --help | ||
|
|
||
| e2e: |
There was a problem hiding this comment.
Why are the e2e tests being removed?
There was a problem hiding this comment.
the e2e testing was only for the arroyo setup. We can add it back for taskworker, but if anything ill do it in a later PR
There was a problem hiding this comment.
ok added e2e testing code back
Was previously pulled in transitively via sentry-kafka-schemas, which was removed as part of the Kafka consumer cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Docker healthcheck for taskworker in devservices config using the /tmp/health file that TaskWorker maintains. Move json import out of while loop body in e2e tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Kafka consumer e2e setup with fully self-contained TaskBroker-based infrastructure: - Kafka (KRaft mode, no Zookeeper dependency) - TaskBroker (ghcr.io/getsentry/taskbroker:nightly) - Launchpad worker connecting via gRPC - Test runner dispatches tasks via process_artifact.delay() No devservices dependency needed — all infra is in docker-compose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 790136b. Configure here.
- Remove external port mapping for taskbroker (only needed internally) - Use JSON array format for TASKBROKER_GRPC_SHARED_SECRET (Vec<String>) - Use JSON array format for TASKWORKER_SHARED_SECRET (parsed by orjson) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iOS analysis takes ~3 minutes, which was right at the 180s timeout boundary. Increase test timeout to 360s and CI timeout to 30 minutes. Bump LAUNCHPAD_WORKER_MAX_CHILD_TASK_COUNT to 10 so the child process doesn't restart between tasks unnecessarily. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iOS analysis (LIEF + insights) takes ~6 min and AAB (bundletool) takes several minutes. Mark these as @pytest.mark.slow so local `make test-e2e` runs only fast tests (APK + error handling, ~23s). CI runs all tests including slow ones with 600s timeouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restructure e2e tests to dispatch all artifact tasks upfront in setup_class, then verify results individually. With worker concurrency=3, iOS/APK/AAB process simultaneously so total time is bounded by the slowest artifact (~6 min) rather than the sum. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kafka's default heap (1GB+) causes CPU contention on CI runners with limited cores, slowing down the CPU-bound artifact analysis. Cap at 256MB since e2e only processes a handful of messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The iOS/AAB analysis takes 6+ minutes on CI due to CPU contention with Kafka JVM. Run only APK + error handling tests (same as local). iOS/AAB analysis is already covered by integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The default namespace processing_deadline_duration was 10 seconds, causing TaskBroker to mark tasks as failed before analysis could complete. Match the 12-minute deadline set on the sentry dispatch side. This was the root cause of iOS/AAB e2e test timeouts. Also re-enable all e2e tests on CI to validate the fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single test class dispatches all tasks in parallel via setup_class, then verifies each artifact. No optional/slow separation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove stale env vars from .envrc (KAFKA_GROUP_ID, KAFKA_TOPICS, LAUNCHPAD_HOST, LAUNCHPAD_PORT, LAUNCHPAD_CREATE_KAFKA_TOPIC) - Remove stale EXPOSE 2218 from Dockerfile (HTTP server is gone) - Add failure exits to CI e2e health-check loops Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| modes: | ||
| default: [kafka] | ||
| containerized: [kafka, launchpad] | ||
| default: [] |
There was a problem hiding this comment.
We don't need any dependencies now?
There was a problem hiding this comment.
The worker connects to TaskBroker via gRPC. Kafka and TaskBroker are managed by sentry's devservices, not launchpad's, so yeah we shouldn't need it anymore from what i can tell

Launchpad has fully migrated to TaskWorker mode for artifact processing. All traffic has been moved off the Kafka/Arroyo consumers, so this PR removes all that dead code.
Deleted (6 files, ~1,000 lines):
src/launchpad/kafka.py— Arroyo consumer, processing strategies, Kafka configsrc/launchpad/service.py— Kafka+HTTP service orchestratorsrc/launchpad/utils/arroyo_metrics.py— Arroyo-to-Datadog metrics bridgescripts/ensure_kafka_topics.py— topic creation utilityscripts/test_kafka.py— test message producertests/integration/test_kafka_service.py— Kafka integration testsUpdated:
servecommand;workeris now the only service commandservetoworkere2ejob entirelyworkerserve,test-kafka-*, andtest-service-integrationtargetsServiceConfigandget_service_confighere (previously inservice.py)sentry-kafka-schemasandkafka-pythonNot changed:
worker/app.pystill usesKafkaProducerfrom arroyo (for TaskBroker), sosentry-arroyoandconfluent-kafkaremain as dependencies for now. That can be cleaned up in a follow-up.