Add consumer-group rebalance, empty-group, and metadata signals to kafka_consumer#23915
Draft
piochelepiotr wants to merge 4 commits into
Draft
Add consumer-group rebalance, empty-group, and metadata signals to kafka_consumer#23915piochelepiotr wants to merge 4 commits into
piochelepiotr wants to merge 4 commits into
Conversation
Enrich the cluster-monitoring consumer-group collection with signals that cannot be derived from existing tagged metrics: - kafka.consumer_group.rebalancing (1/0): detected via group state (PreparingRebalance/CompletingRebalance) for classic groups and via assignment != target_assignment for KIP-848 consumer-protocol groups. - kafka.consumer_group.empty (1/0): 1 when the group is in the EMPTY state (committed offsets but no active members). Dimensional metadata is added as tags on existing gauges rather than new metrics: partition_assignor, consumer_group_type, and is_simple_consumer_group on consumer_group.members, and group_instance_id (static membership) on consumer_group.member.partitions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 625cfcd | Docs | Datadog PR Page | Give us feedback! |
- Extract _build_group_meta_tags helper from the collection loop. - Use `is not None` guards for partition_assignor and group_instance_id so empty-string values are not silently dropped. - Emit consumer_group.rebalancing and consumer_group.empty with the same group_meta_tags as consumer_group.members so the sibling gauges share a tag set and can be correlated in dashboards. - Reduce test mock duplication: _collect_groups now reuses seed_mock_kafka_client and a shared _stub_consumer_groups helper. - Add tests for the dimensional-tag omission path and the no-target-assignment rebalance-skip branch. - Name the new tag keys in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Revert partition_assignor guard to `if assignor:` so KIP-848 and EMPTY-state groups (which report an empty assignor) don't emit a blank-value partition_assignor: tag. Parametrize the absent-tags test to cover both None and "". - Type-hint state_name on _is_group_rebalancing. - Add comments documenting the member-level vs group-level tag-set choice and the EMPTY-state basis for consumer_group.empty. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Validation ReportAll 21 validations passed. Show details
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Enriches the
kafka_consumercluster-monitoring consumer-group collection (enable_cluster_monitoring) with signals that cannot be derived by grouping existing tagged metrics in the Datadog UI:kafka.consumer_group.rebalancing(gauge, 1/0): detected two ways — group state (PreparingRebalance/CompletingRebalance) for classic groups, andassignment != target_assignmentfor KIP-848 consumer-protocol groups.kafka.consumer_group.empty(gauge, 1/0):1when the group is in theEMPTYstate (committed offsets but no active members — an orphaned/abandoned group).Dimensional group metadata is added as tags on existing gauges (no new metrics), per review preference:
partition_assignor,consumer_group_type,is_simple_consumer_grouponkafka.consumer_group.membersgroup_instance_id(static membership) onkafka.consumer_group.member.partitionsAll data is sourced from fields
describe_consumer_groupsalready returns in the pinnedconfluent-kafka==2.13.2client — no new API calls.This is PR 1 of a sequenced plan to close the "consumer groups are second-class" gap identified in the DSM-Kafka vendor evaluation. Lag rollups were intentionally excluded:
kafka.consumer_lagis already tagged byconsumer_group, so per-group lag is a UI group-by, not a new metric.Motivation
The DSM-Kafka product has no first-class consumer-group view; group health (rebalancing, orphaned groups, assignment strategy) was not observable. These signals are not derivable from existing metrics and require their own series to be alertable.
Review checklist (to be filled by reviewers)
qa/requiredif this PR needs QA validation, orqa/skip-qaif it does not. Exactly one of the two is required.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged🤖 Generated with Claude Code