Skip to content

chore: add goleak to all test packages and fix goroutine leaks#5229

Open
dongjiang1989 wants to merge 2 commits into
prometheus:mainfrom
dongjiang1989:memory-leak
Open

chore: add goleak to all test packages and fix goroutine leaks#5229
dongjiang1989 wants to merge 2 commits into
prometheus:mainfrom
dongjiang1989:memory-leak

Conversation

@dongjiang1989
Copy link
Copy Markdown
Contributor

@dongjiang1989 dongjiang1989 commented May 7, 2026

Which user-facing changes does this PR introduce?

Summary

This PR integrates go.uber.org/goleak across all test packages to detect goroutine leaks, and fixes all identified
leaks.

Changes

  • Added goleak.VerifyTestMain(m) to 49 existing test files (one per package)
  • Added go.uber.org/goleak@v1.3.0 as a direct dependency
add goleak to all test packages and fix goroutine leaks

Summary by CodeRabbit

  • Tests
    • Added goroutine leak detection across the test suite to catch resource leaks and improve test reliability.
    • Enhanced test cleanup and teardown in many tests to ensure resources are consistently released and tests run more robustly.

Signed-off-by: dongjiang <dongjiang1989@126.com>
@dongjiang1989 dongjiang1989 requested a review from a team as a code owner May 7, 2026 12:35
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1d830392-46f0-4bfd-9bca-d77194b4ae74

📥 Commits

Reviewing files that changed from the base of the PR and between 717034b and 004695d.

📒 Files selected for processing (46)
  • alert/alert_test.go
  • api/v2/api_test.go
  • cli/check_config_test.go
  • cli/config/config_test.go
  • cmd/alertmanager/main_test.go
  • config/common/inhibitrule_test.go
  • config/receiver/receiver_test.go
  • dispatch/dispatch_bench_test.go
  • eventrecorder/eventrecorder_test.go
  • featurecontrol/featurecontrol_test.go
  • httpserver/httpserver_test.go
  • inhibit/inhibit_bench_test.go
  • limit/bucket_test.go
  • matcher/compat/parse_test.go
  • matcher/compliance/compliance_test.go
  • matcher/parse/bench_test.go
  • nflog/nflog_test.go
  • nflog/nflogpb/set_test.go
  • notify/discord/discord_test.go
  • notify/email/email_test.go
  • notify/incidentio/incidentio_test.go
  • notify/jira/jira_test.go
  • notify/mattermost/mattermost_test.go
  • notify/msteams/msteams_test.go
  • notify/msteamsv2/msteamsv2_test.go
  • notify/mute_test.go
  • notify/opsgenie/opsgenie_test.go
  • notify/pagerduty/pagerduty_test.go
  • notify/pushover/pushover_test.go
  • notify/rocketchat/rocketchat_test.go
  • notify/slack/slack_test.go
  • notify/sns/sns_test.go
  • notify/telegram/telegram_test.go
  • notify/victorops/victorops_test.go
  • notify/webex/webex_test.go
  • notify/webhook/config_test.go
  • notify/wechat/wechat_test.go
  • pkg/labels/matcher_test.go
  • silence/cache_test.go
  • store/store_test.go
  • template/template_test.go
  • test/with_api_v2/acceptance/api_test.go
  • timeinterval/timeinterval_test.go
  • tracing/tracing_test.go
  • types/types_test.go
  • ui/web_test.go
✅ Files skipped from review due to trivial changes (1)
  • types/types_test.go
🚧 Files skipped from review as they are similar to previous changes (27)
  • notify/victorops/victorops_test.go
  • limit/bucket_test.go
  • test/with_api_v2/acceptance/api_test.go
  • notify/opsgenie/opsgenie_test.go
  • notify/webex/webex_test.go
  • notify/webhook/config_test.go
  • matcher/compat/parse_test.go
  • httpserver/httpserver_test.go
  • notify/telegram/telegram_test.go
  • notify/slack/slack_test.go
  • timeinterval/timeinterval_test.go
  • pkg/labels/matcher_test.go
  • cmd/alertmanager/main_test.go
  • notify/pagerduty/pagerduty_test.go
  • notify/discord/discord_test.go
  • inhibit/inhibit_bench_test.go
  • notify/mattermost/mattermost_test.go
  • matcher/parse/bench_test.go
  • ui/web_test.go
  • notify/msteams/msteams_test.go
  • api/v2/api_test.go
  • cli/check_config_test.go
  • template/template_test.go
  • tracing/tracing_test.go
  • notify/sns/sns_test.go
  • notify/mute_test.go
  • config/common/inhibitrule_test.go

📝 Walkthrough

Walkthrough

Adds go.uber.org/goleak v1.3.0 and introduces package-level TestMain wrappers calling goleak.VerifyTestMain(m) across many test packages; also applies targeted test cleanup and wiring changes (explicit stop channels, deferred Close/t.Cleanup) in several tests.

Changes

Goroutine Leak Detection + Test Cleanup

Layer / File(s) Summary
Dependency Addition
go.mod
Added go.uber.org/goleak v1.3.0 to project requirements.
Test Harness Setup
alert/alert_test.go, api/v2/api_test.go, cli/check_config_test.go, cli/config/config_test.go, cmd/alertmanager/main_test.go, config/common/inhibitrule_test.go, config/config_fuzz_test.go, config/receiver/receiver_test.go, dispatch/dispatch_bench_test.go, eventrecorder/eventrecorder_test.go, featurecontrol/featurecontrol_test.go, httpserver/httpserver_test.go, inhibit/inhibit_bench_test.go, limit/bucket_test.go, matcher/compat/parse_test.go, matcher/compliance/compliance_test.go, matcher/parse/bench_test.go, nflog/nflog_test.go, nflog/nflogpb/set_test.go, notify/discord/discord_test.go, notify/email/email_test.go, notify/incidentio/incidentio_test.go, notify/jira/jira_test.go, notify/mattermost/mattermost_test.go, notify/msteams/msteams_test.go, notify/msteamsv2/msteamsv2_test.go, notify/mute_test.go, notify/opsgenie/opsgenie_test.go, notify/pagerduty/pagerduty_test.go, notify/pushover/pushover_test.go, notify/rocketchat/rocketchat_test.go, notify/slack/slack_test.go, notify/sns/sns_test.go, notify/telegram/telegram_test.go, notify/victorops/victorops_test.go, notify/webex/webex_test.go, notify/webhook/config_test.go, notify/wechat/wechat_test.go, pkg/labels/matcher_test.go, silence/cache_test.go, store/store_test.go, template/template_test.go, test/with_api_v2/acceptance/api_test.go, timeinterval/timeinterval_test.go, types/types_test.go, ui/web_test.go
Each listed test package adds an import for go.uber.org/goleak and a package-level TestMain(m *testing.M) that calls goleak.VerifyTestMain(m) to enable goroutine-leak verification across the package’s tests.
Targeted Cleanup & Test Wiring
cluster/channel_test.go, cluster/cluster_test.go, dispatch/dispatch_test.go, provider/mem/mem_test.go, tracing/tracing_test.go, notify/discord/discord_test.go, notify/mattermost/mattermost_test.go
Explicit resource cleanup added: tests now pass/close explicit stop channels, call Leave(...) on peers, defer alerts.Close() or srv.Close(), register t.Cleanup(m.Stop), and modify helper signatures to accept injected stop channels where applicable.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • TheMeier
  • siavashs
  • sysadmind
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding goleak to all test packages and fixing goroutine leaks.
Description check ✅ Passed The PR description is mostly complete with a clear summary and list of changes, but lacks release notes content and does not follow the template structure fully.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
provider/mem/mem_test.go (1)

170-183: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Goroutine in TestDeadLock never terminates — goleak will fail on successful runs.

break at line 180 exits only the select statement, not the outer for loop. After sending on done, the goroutine re-enters the loop; since stopAfter is already drained, only tick.C is available and alerts.Subscribe("test") is called indefinitely every 10 ms. This goroutine remains running after the test completes, and VerifyTestMain checks for leaks after all tests have run on a successful run. This will cause the package's goleak check to fail even when all tests pass.

🐛 Proposed fix
 		case <-stopAfter:
 			done <- true
-			break
+			return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@provider/mem/mem_test.go` around lines 170 - 183, The goroutine in
TestDeadLock never exits because the plain break only leaves the select, not the
for loop; change the exit to stop the outer loop (e.g., use a labeled break or
simply send to done and return) so after sending on done the goroutine returns
and stops calling alerts.Subscribe("test"); update the anonymous func started in
the test (the one that creates tick and stopAfter) to ensure it returns after
sending to done and stops the ticker to avoid leaking the goroutine.
🧹 Nitpick comments (1)
notify/pagerduty/pagerduty_test.go (1)

590-636: ⚡ Quick win

Verify TestPagerDutyTimeout doesn't cause transient goleak failures.

The time.Sleep(tt.latency) in the test server handler runs on a goroutine managed by httptest.Server. After defer srv.Close() returns, the sleeping goroutine may still be alive briefly. If goleak's final check runs before the goroutine finishes sleeping (up to 100 ms), it could produce a spurious failure.

If this surfaces in CI, add a short goleak.IgnoreTopFunction option or introduce a context-aware sleep in the handler:

-time.Sleep(tt.latency)
+select {
+case <-r.Context().Done():
+case <-time.After(tt.latency):
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@notify/pagerduty/pagerduty_test.go` around lines 590 - 636, The test's
handler uses time.Sleep(tt.latency) which can leave a sleeping goroutine alive
after srv.Close(), causing transient goleak failures; update the handler in
TestPagerDutyTimeout to make the sleep context-aware by replacing
time.Sleep(tt.latency) with a select that waits on time.After(tt.latency) or
r.Context().Done() (e.g., select { case <-time.After(tt.latency): case
<-r.Context().Done(): }) so the handler goroutine exits promptly when the test
server is closed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cluster/cluster_test.go`:
- Line 117: The test currently calls p.Leave(...) as a trailing call which is
skipped if an earlier require.* fails, causing peers/goroutines to leak; replace
those tail-call teardowns by registering unconditional cleanup handlers with
t.Cleanup(func(){ p.Leave(0 * time.Second) }) (and similarly for other
peers/locations where p.Leave is used at the end) so Leave always runs even when
the test aborts; locate uses of p.Leave(...) in the failing test functions and
convert each to t.Cleanup with the same Leave invocation.

In `@provider/mem/mem_test.go`:
- Line 146: Remove the redundant explicit call to alerts.Close() inside the
"case <-done:" branch — the existing defer alerts.Close() at function start
already ensures Close() runs when the test function returns; locate the "case
<-done:" branch in the test and delete the alerts.Close() invocation (leave the
defer alerts.Close() and the done handling intact).

---

Outside diff comments:
In `@provider/mem/mem_test.go`:
- Around line 170-183: The goroutine in TestDeadLock never exits because the
plain break only leaves the select, not the for loop; change the exit to stop
the outer loop (e.g., use a labeled break or simply send to done and return) so
after sending on done the goroutine returns and stops calling
alerts.Subscribe("test"); update the anonymous func started in the test (the one
that creates tick and stopAfter) to ensure it returns after sending to done and
stops the ticker to avoid leaking the goroutine.

---

Nitpick comments:
In `@notify/pagerduty/pagerduty_test.go`:
- Around line 590-636: The test's handler uses time.Sleep(tt.latency) which can
leave a sleeping goroutine alive after srv.Close(), causing transient goleak
failures; update the handler in TestPagerDutyTimeout to make the sleep
context-aware by replacing time.Sleep(tt.latency) with a select that waits on
time.After(tt.latency) or r.Context().Done() (e.g., select { case
<-time.After(tt.latency): case <-r.Context().Done(): }) so the handler goroutine
exits promptly when the test server is closed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 776eb23b-7f47-4ee4-94d3-fc3eff279fd1

📥 Commits

Reviewing files that changed from the base of the PR and between c85e06d and 717034b.

📒 Files selected for processing (52)
  • alert/alert_test.go
  • api/v2/api_test.go
  • cli/check_config_test.go
  • cli/config/config_test.go
  • cluster/channel_test.go
  • cluster/cluster_test.go
  • cmd/alertmanager/main_test.go
  • config/common/inhibitrule_test.go
  • config/config_fuzz_test.go
  • config/receiver/receiver_test.go
  • dispatch/dispatch_bench_test.go
  • dispatch/dispatch_test.go
  • eventrecorder/eventrecorder_test.go
  • featurecontrol/featurecontrol_test.go
  • go.mod
  • httpserver/httpserver_test.go
  • inhibit/inhibit_bench_test.go
  • limit/bucket_test.go
  • matcher/compat/parse_test.go
  • matcher/compliance/compliance_test.go
  • matcher/parse/bench_test.go
  • nflog/nflog_test.go
  • nflog/nflogpb/set_test.go
  • notify/discord/discord_test.go
  • notify/email/email_test.go
  • notify/incidentio/incidentio_test.go
  • notify/jira/jira_test.go
  • notify/mattermost/mattermost_test.go
  • notify/msteams/msteams_test.go
  • notify/msteamsv2/msteamsv2_test.go
  • notify/mute_test.go
  • notify/opsgenie/opsgenie_test.go
  • notify/pagerduty/pagerduty_test.go
  • notify/pushover/pushover_test.go
  • notify/rocketchat/rocketchat_test.go
  • notify/slack/slack_test.go
  • notify/sns/sns_test.go
  • notify/telegram/telegram_test.go
  • notify/victorops/victorops_test.go
  • notify/webex/webex_test.go
  • notify/webhook/config_test.go
  • notify/wechat/wechat_test.go
  • pkg/labels/matcher_test.go
  • provider/mem/mem_test.go
  • silence/cache_test.go
  • store/store_test.go
  • template/template_test.go
  • test/with_api_v2/acceptance/api_test.go
  • timeinterval/timeinterval_test.go
  • tracing/tracing_test.go
  • types/types_test.go
  • ui/web_test.go

Comment thread cluster/cluster_test.go
require.Len(t, p.failedPeers, 1)
require.Equal(t, p2.Self().Address(), p.peers[p2.Self().Address()].Address())
require.Equal(t, p2.Name(), p.failedPeers[0].Name)
p.Leave(0 * time.Second)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make peer teardown unconditional with t.Cleanup (not tail calls).

Line 117 (and the same pattern at Lines 189-190, 243, 294, 370, 446-447): placing Leave(...) at function end skips cleanup when an earlier require.* fails. That can leave goroutines behind and cause leak-check flakiness.

Suggested pattern
 p, err := Create(...)
 require.NoError(t, err)
+t.Cleanup(func() { p.Leave(0 * time.Second) })

 p2, err := Create(...)
 require.NoError(t, err)
+t.Cleanup(func() { p2.Leave(0 * time.Second) })

 ...
-p.Leave(0 * time.Second)
-p2.Leave(0 * time.Second)

Also applies to: 189-190, 243-243, 294-294, 370-370, 446-447

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cluster/cluster_test.go` at line 117, The test currently calls p.Leave(...)
as a trailing call which is skipped if an earlier require.* fails, causing
peers/goroutines to leak; replace those tail-call teardowns by registering
unconditional cleanup handlers with t.Cleanup(func(){ p.Leave(0 * time.Second)
}) (and similarly for other peers/locations where p.Leave is used at the end) so
Leave always runs even when the test aborts; locate uses of p.Leave(...) in the
failing test functions and convert each to t.Cleanup with the same Leave
invocation.

Comment thread provider/mem/mem_test.go
if err != nil {
t.Fatal(err)
}
defer alerts.Close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the Close() implementation of mem.Alerts to check idempotency
ast-grep --pattern 'func ($_ *Alerts) Close() {
  $$$
}'

Repository: prometheus/alertmanager

Length of output: 254


🏁 Script executed:

# Read the test file around the mentioned lines
sed -n '130,200p' provider/mem/mem_test.go

Repository: prometheus/alertmanager

Length of output: 2087


Remove the explicit alerts.Close() call at line 188; the defer at line 146 is sufficient.

The explicit alerts.Close() in the case <-done: branch is redundant since defer alerts.Close() already executes when the function returns. The Close() method only calls a.cancel() (a context.CancelFunc), which is idempotent and safe to call multiple times, so there is no panic risk. However, the redundant call should still be removed for clarity.

Proposed fix
 	select {
 	case <-done:
 		// no deadlock
-		alerts.Close()
 	case <-time.After(10 * time.Second):
 		t.Error("Deadlock detected")
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@provider/mem/mem_test.go` at line 146, Remove the redundant explicit call to
alerts.Close() inside the "case <-done:" branch — the existing defer
alerts.Close() at function start already ensures Close() runs when the test
function returns; locate the "case <-done:" branch in the test and delete the
alerts.Close() invocation (leave the defer alerts.Close() and the done handling
intact).

Signed-off-by: dongjiang <dongjiang1989@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant