Skip to content

Add consumer-group rebalance, empty-group, and metadata signals to kafka_consumer#23915

Draft
piochelepiotr wants to merge 4 commits into
masterfrom
pwolski/kafka-consumer-group-signals
Draft

Add consumer-group rebalance, empty-group, and metadata signals to kafka_consumer#23915
piochelepiotr wants to merge 4 commits into
masterfrom
pwolski/kafka-consumer-group-signals

Conversation

@piochelepiotr
Copy link
Copy Markdown
Contributor

What does this PR do?

Enriches the kafka_consumer cluster-monitoring consumer-group collection (enable_cluster_monitoring) with signals that cannot be derived by grouping existing tagged metrics in the Datadog UI:

  • kafka.consumer_group.rebalancing (gauge, 1/0): detected two ways — group state (PreparingRebalance/CompletingRebalance) for classic groups, and assignment != target_assignment for KIP-848 consumer-protocol groups.
  • kafka.consumer_group.empty (gauge, 1/0): 1 when the group is in the EMPTY state (committed offsets but no active members — an orphaned/abandoned group).

Dimensional group metadata is added as tags on existing gauges (no new metrics), per review preference:

  • partition_assignor, consumer_group_type, is_simple_consumer_group on kafka.consumer_group.members
  • group_instance_id (static membership) on kafka.consumer_group.member.partitions

All data is sourced from fields describe_consumer_groups already returns in the pinned confluent-kafka==2.13.2 client — no new API calls.

This is PR 1 of a sequenced plan to close the "consumer groups are second-class" gap identified in the DSM-Kafka vendor evaluation. Lag rollups were intentionally excluded: kafka.consumer_lag is already tagged by consumer_group, so per-group lag is a UI group-by, not a new metric.

Motivation

The DSM-Kafka product has no first-class consumer-group view; group health (rebalancing, orphaned groups, assignment strategy) was not observable. These signals are not derivable from existing metrics and require their own series to be alertable.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

🤖 Generated with Claude Code

Enrich the cluster-monitoring consumer-group collection with signals that
cannot be derived from existing tagged metrics:

- kafka.consumer_group.rebalancing (1/0): detected via group state
  (PreparingRebalance/CompletingRebalance) for classic groups and via
  assignment != target_assignment for KIP-848 consumer-protocol groups.
- kafka.consumer_group.empty (1/0): 1 when the group is in the EMPTY state
  (committed offsets but no active members).

Dimensional metadata is added as tags on existing gauges rather than new
metrics: partition_assignor, consumer_group_type, and is_simple_consumer_group
on consumer_group.members, and group_instance_id (static membership) on
consumer_group.member.partitions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@piochelepiotr piochelepiotr added qa/required QA is required for this PR and will generate a QA card and removed qa/skip-qa Automatically skip this PR for the next QA labels Jun 2, 2026
@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented Jun 2, 2026

Tests  Code Coverage

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 88.50% (+1.04%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 625cfcd | Docs | Datadog PR Page | Give us feedback!

piochelepiotr and others added 2 commits June 2, 2026 10:31
- Extract _build_group_meta_tags helper from the collection loop.
- Use `is not None` guards for partition_assignor and group_instance_id so
  empty-string values are not silently dropped.
- Emit consumer_group.rebalancing and consumer_group.empty with the same
  group_meta_tags as consumer_group.members so the sibling gauges share a
  tag set and can be correlated in dashboards.
- Reduce test mock duplication: _collect_groups now reuses
  seed_mock_kafka_client and a shared _stub_consumer_groups helper.
- Add tests for the dimensional-tag omission path and the
  no-target-assignment rebalance-skip branch.
- Name the new tag keys in the README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Revert partition_assignor guard to `if assignor:` so KIP-848 and
  EMPTY-state groups (which report an empty assignor) don't emit a
  blank-value partition_assignor: tag. Parametrize the absent-tags test
  to cover both None and "".
- Type-hint state_name on _is_group_rebalancing.
- Add comments documenting the member-level vs group-level tag-set choice
  and the EMPTY-state basis for consumer_group.empty.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 2, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant