Skip to content

ref: Remove Kafka/Arroyo consumer code#596

Merged
NicoHinderling merged 15 commits intomainfrom
delete-kafka-arroyo-code
Apr 9, 2026
Merged

ref: Remove Kafka/Arroyo consumer code#596
NicoHinderling merged 15 commits intomainfrom
delete-kafka-arroyo-code

Conversation

@NicoHinderling
Copy link
Copy Markdown
Contributor

Launchpad has fully migrated to TaskWorker mode for artifact processing. All traffic has been moved off the Kafka/Arroyo consumers, so this PR removes all that dead code.

Deleted (6 files, ~1,000 lines):

  • src/launchpad/kafka.py — Arroyo consumer, processing strategies, Kafka config
  • src/launchpad/service.py — Kafka+HTTP service orchestrator
  • src/launchpad/utils/arroyo_metrics.py — Arroyo-to-Datadog metrics bridge
  • scripts/ensure_kafka_topics.py — topic creation utility
  • scripts/test_kafka.py — test message producer
  • tests/integration/test_kafka_service.py — Kafka integration tests

Updated:

  • CLI: Removed serve command; worker is now the only service command
  • Dockerfile: Default CMD changed from serve to worker
  • CI: Removed Kafka from test job (no longer needs devservices), removed e2e job entirely
  • devservices: Removed Kafka dependency, updated containerized service to use worker
  • Makefile: Removed serve, test-kafka-*, and test-service-integration targets
  • artifact_processor.py: Relocated ServiceConfig and get_service_config here (previously in service.py)
  • Dependencies: Removed sentry-kafka-schemas and kafka-python

Not changed:

  • worker/app.py still uses KafkaProducer from arroyo (for TaskBroker), so sentry-arroyo and confluent-kafka remain as dependencies for now. That can be cleaned up in a follow-up.
  • E2E test infrastructure files are kept but the Kafka-based test methods are removed. E2E tests need to be rewritten for TaskWorker-based triggering.

Launchpad has fully migrated to TaskWorker mode for processing
artifacts. Remove all Kafka/Arroyo consumer infrastructure since
it is no longer in use.

Deleted:
- kafka.py (consumer, strategies, config)
- service.py (Kafka+HTTP orchestrator)
- arroyo_metrics.py (Arroyo metrics backend)
- ensure_kafka_topics.py and test_kafka.py scripts
- Kafka integration and E2E tests

Updated:
- CLI: removed `serve` command, worker is the only mode
- Dockerfile: default CMD changed from serve to worker
- CI: removed Kafka from test job, removed e2e job
- devservices: removed Kafka dependency
- Relocated ServiceConfig to artifact_processor.py
- Removed sentry-kafka-schemas and kafka-python deps

Note: sentry-arroyo and confluent-kafka remain as deps because
worker/app.py still uses KafkaProducer (to be cleaned up separately).

Co-Authored-By: Claude <noreply@anthropic.com>
@sentry
Copy link
Copy Markdown
Contributor

sentry bot commented Apr 7, 2026

Sentry Build Distribution

App Name App ID Version Configuration Install Page
Hacker News com.emergetools.hackernews 1.0.2 (13) Release Install Build

Configure launchpad-test-android build distribution settings

The LaunchpadServer was only used by the Kafka/Arroyo service
orchestrator. With TaskWorker mode, no HTTP server is needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@NicoHinderling NicoHinderling marked this pull request as ready for review April 9, 2026 17:35
-e LAUNCHPAD_PORT=2218 \
launchpad-test --help

e2e:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the e2e tests being removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the e2e testing was only for the arroyo setup. We can add it back for taskworker, but if anything ill do it in a later PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok added e2e testing code back

Was previously pulled in transitively via sentry-kafka-schemas,
which was removed as part of the Kafka consumer cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NicoHinderling and others added 2 commits April 9, 2026 11:03
Add Docker healthcheck for taskworker in devservices config using
the /tmp/health file that TaskWorker maintains. Move json import
out of while loop body in e2e tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Kafka consumer e2e setup with fully self-contained
TaskBroker-based infrastructure:

- Kafka (KRaft mode, no Zookeeper dependency)
- TaskBroker (ghcr.io/getsentry/taskbroker:nightly)
- Launchpad worker connecting via gRPC
- Test runner dispatches tasks via process_artifact.delay()

No devservices dependency needed — all infra is in docker-compose.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 790136b. Configure here.

NicoHinderling and others added 3 commits April 9, 2026 11:49
- Remove external port mapping for taskbroker (only needed internally)
- Use JSON array format for TASKBROKER_GRPC_SHARED_SECRET (Vec<String>)
- Use JSON array format for TASKWORKER_SHARED_SECRET (parsed by orjson)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iOS analysis takes ~3 minutes, which was right at the 180s timeout
boundary. Increase test timeout to 360s and CI timeout to 30 minutes.
Bump LAUNCHPAD_WORKER_MAX_CHILD_TASK_COUNT to 10 so the child process
doesn't restart between tasks unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iOS analysis (LIEF + insights) takes ~6 min and AAB (bundletool)
takes several minutes. Mark these as @pytest.mark.slow so local
`make test-e2e` runs only fast tests (APK + error handling, ~23s).

CI runs all tests including slow ones with 600s timeouts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NicoHinderling and others added 6 commits April 9, 2026 13:03
Restructure e2e tests to dispatch all artifact tasks upfront in
setup_class, then verify results individually. With worker
concurrency=3, iOS/APK/AAB process simultaneously so total time
is bounded by the slowest artifact (~6 min) rather than the sum.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kafka's default heap (1GB+) causes CPU contention on CI runners
with limited cores, slowing down the CPU-bound artifact analysis.
Cap at 256MB since e2e only processes a handful of messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The iOS/AAB analysis takes 6+ minutes on CI due to CPU contention
with Kafka JVM. Run only APK + error handling tests (same as local).
iOS/AAB analysis is already covered by integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The default namespace processing_deadline_duration was 10 seconds,
causing TaskBroker to mark tasks as failed before analysis could
complete. Match the 12-minute deadline set on the sentry dispatch
side. This was the root cause of iOS/AAB e2e test timeouts.

Also re-enable all e2e tests on CI to validate the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single test class dispatches all tasks in parallel via setup_class,
then verifies each artifact. No optional/slow separation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove stale env vars from .envrc (KAFKA_GROUP_ID, KAFKA_TOPICS,
  LAUNCHPAD_HOST, LAUNCHPAD_PORT, LAUNCHPAD_CREATE_KAFKA_TOPIC)
- Remove stale EXPOSE 2218 from Dockerfile (HTTP server is gone)
- Add failure exits to CI e2e health-check loops

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
modes:
default: [kafka]
containerized: [kafka, launchpad]
default: []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need any dependencies now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worker connects to TaskBroker via gRPC. Kafka and TaskBroker are managed by sentry's devservices, not launchpad's, so yeah we shouldn't need it anymore from what i can tell

@NicoHinderling NicoHinderling merged commit 5b12174 into main Apr 9, 2026
22 checks passed
@NicoHinderling NicoHinderling deleted the delete-kafka-arroyo-code branch April 9, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants