chore: add goleak to all test packages and fix goroutine leaks#5229
chore: add goleak to all test packages and fix goroutine leaks#5229dongjiang1989 wants to merge 2 commits into
Conversation
Signed-off-by: dongjiang <dongjiang1989@126.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (46)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (27)
📝 WalkthroughWalkthroughAdds go.uber.org/goleak v1.3.0 and introduces package-level TestMain wrappers calling goleak.VerifyTestMain(m) across many test packages; also applies targeted test cleanup and wiring changes (explicit stop channels, deferred Close/t.Cleanup) in several tests. ChangesGoroutine Leak Detection + Test Cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
provider/mem/mem_test.go (1)
170-183:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGoroutine in
TestDeadLocknever terminates — goleak will fail on successful runs.
breakat line 180 exits only theselectstatement, not the outerforloop. After sending ondone, the goroutine re-enters the loop; sincestopAfteris already drained, onlytick.Cis available andalerts.Subscribe("test")is called indefinitely every 10 ms. This goroutine remains running after the test completes, andVerifyTestMainchecks for leaks after all tests have run on a successful run. This will cause the package's goleak check to fail even when all tests pass.🐛 Proposed fix
case <-stopAfter: done <- true - break + return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@provider/mem/mem_test.go` around lines 170 - 183, The goroutine in TestDeadLock never exits because the plain break only leaves the select, not the for loop; change the exit to stop the outer loop (e.g., use a labeled break or simply send to done and return) so after sending on done the goroutine returns and stops calling alerts.Subscribe("test"); update the anonymous func started in the test (the one that creates tick and stopAfter) to ensure it returns after sending to done and stops the ticker to avoid leaking the goroutine.
🧹 Nitpick comments (1)
notify/pagerduty/pagerduty_test.go (1)
590-636: ⚡ Quick winVerify
TestPagerDutyTimeoutdoesn't cause transient goleak failures.The
time.Sleep(tt.latency)in the test server handler runs on a goroutine managed byhttptest.Server. Afterdefer srv.Close()returns, the sleeping goroutine may still be alive briefly. If goleak's final check runs before the goroutine finishes sleeping (up to 100 ms), it could produce a spurious failure.If this surfaces in CI, add a short
goleak.IgnoreTopFunctionoption or introduce a context-aware sleep in the handler:-time.Sleep(tt.latency) +select { +case <-r.Context().Done(): +case <-time.After(tt.latency): +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@notify/pagerduty/pagerduty_test.go` around lines 590 - 636, The test's handler uses time.Sleep(tt.latency) which can leave a sleeping goroutine alive after srv.Close(), causing transient goleak failures; update the handler in TestPagerDutyTimeout to make the sleep context-aware by replacing time.Sleep(tt.latency) with a select that waits on time.After(tt.latency) or r.Context().Done() (e.g., select { case <-time.After(tt.latency): case <-r.Context().Done(): }) so the handler goroutine exits promptly when the test server is closed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cluster/cluster_test.go`:
- Line 117: The test currently calls p.Leave(...) as a trailing call which is
skipped if an earlier require.* fails, causing peers/goroutines to leak; replace
those tail-call teardowns by registering unconditional cleanup handlers with
t.Cleanup(func(){ p.Leave(0 * time.Second) }) (and similarly for other
peers/locations where p.Leave is used at the end) so Leave always runs even when
the test aborts; locate uses of p.Leave(...) in the failing test functions and
convert each to t.Cleanup with the same Leave invocation.
In `@provider/mem/mem_test.go`:
- Line 146: Remove the redundant explicit call to alerts.Close() inside the
"case <-done:" branch — the existing defer alerts.Close() at function start
already ensures Close() runs when the test function returns; locate the "case
<-done:" branch in the test and delete the alerts.Close() invocation (leave the
defer alerts.Close() and the done handling intact).
---
Outside diff comments:
In `@provider/mem/mem_test.go`:
- Around line 170-183: The goroutine in TestDeadLock never exits because the
plain break only leaves the select, not the for loop; change the exit to stop
the outer loop (e.g., use a labeled break or simply send to done and return) so
after sending on done the goroutine returns and stops calling
alerts.Subscribe("test"); update the anonymous func started in the test (the one
that creates tick and stopAfter) to ensure it returns after sending to done and
stops the ticker to avoid leaking the goroutine.
---
Nitpick comments:
In `@notify/pagerduty/pagerduty_test.go`:
- Around line 590-636: The test's handler uses time.Sleep(tt.latency) which can
leave a sleeping goroutine alive after srv.Close(), causing transient goleak
failures; update the handler in TestPagerDutyTimeout to make the sleep
context-aware by replacing time.Sleep(tt.latency) with a select that waits on
time.After(tt.latency) or r.Context().Done() (e.g., select { case
<-time.After(tt.latency): case <-r.Context().Done(): }) so the handler goroutine
exits promptly when the test server is closed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 776eb23b-7f47-4ee4-94d3-fc3eff279fd1
📒 Files selected for processing (52)
alert/alert_test.goapi/v2/api_test.gocli/check_config_test.gocli/config/config_test.gocluster/channel_test.gocluster/cluster_test.gocmd/alertmanager/main_test.goconfig/common/inhibitrule_test.goconfig/config_fuzz_test.goconfig/receiver/receiver_test.godispatch/dispatch_bench_test.godispatch/dispatch_test.goeventrecorder/eventrecorder_test.gofeaturecontrol/featurecontrol_test.gogo.modhttpserver/httpserver_test.goinhibit/inhibit_bench_test.golimit/bucket_test.gomatcher/compat/parse_test.gomatcher/compliance/compliance_test.gomatcher/parse/bench_test.gonflog/nflog_test.gonflog/nflogpb/set_test.gonotify/discord/discord_test.gonotify/email/email_test.gonotify/incidentio/incidentio_test.gonotify/jira/jira_test.gonotify/mattermost/mattermost_test.gonotify/msteams/msteams_test.gonotify/msteamsv2/msteamsv2_test.gonotify/mute_test.gonotify/opsgenie/opsgenie_test.gonotify/pagerduty/pagerduty_test.gonotify/pushover/pushover_test.gonotify/rocketchat/rocketchat_test.gonotify/slack/slack_test.gonotify/sns/sns_test.gonotify/telegram/telegram_test.gonotify/victorops/victorops_test.gonotify/webex/webex_test.gonotify/webhook/config_test.gonotify/wechat/wechat_test.gopkg/labels/matcher_test.goprovider/mem/mem_test.gosilence/cache_test.gostore/store_test.gotemplate/template_test.gotest/with_api_v2/acceptance/api_test.gotimeinterval/timeinterval_test.gotracing/tracing_test.gotypes/types_test.goui/web_test.go
| require.Len(t, p.failedPeers, 1) | ||
| require.Equal(t, p2.Self().Address(), p.peers[p2.Self().Address()].Address()) | ||
| require.Equal(t, p2.Name(), p.failedPeers[0].Name) | ||
| p.Leave(0 * time.Second) |
There was a problem hiding this comment.
Make peer teardown unconditional with t.Cleanup (not tail calls).
Line 117 (and the same pattern at Lines 189-190, 243, 294, 370, 446-447): placing Leave(...) at function end skips cleanup when an earlier require.* fails. That can leave goroutines behind and cause leak-check flakiness.
Suggested pattern
p, err := Create(...)
require.NoError(t, err)
+t.Cleanup(func() { p.Leave(0 * time.Second) })
p2, err := Create(...)
require.NoError(t, err)
+t.Cleanup(func() { p2.Leave(0 * time.Second) })
...
-p.Leave(0 * time.Second)
-p2.Leave(0 * time.Second)Also applies to: 189-190, 243-243, 294-294, 370-370, 446-447
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cluster/cluster_test.go` at line 117, The test currently calls p.Leave(...)
as a trailing call which is skipped if an earlier require.* fails, causing
peers/goroutines to leak; replace those tail-call teardowns by registering
unconditional cleanup handlers with t.Cleanup(func(){ p.Leave(0 * time.Second)
}) (and similarly for other peers/locations where p.Leave is used at the end) so
Leave always runs even when the test aborts; locate uses of p.Leave(...) in the
failing test functions and convert each to t.Cleanup with the same Leave
invocation.
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| defer alerts.Close() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect the Close() implementation of mem.Alerts to check idempotency
ast-grep --pattern 'func ($_ *Alerts) Close() {
$$$
}'Repository: prometheus/alertmanager
Length of output: 254
🏁 Script executed:
# Read the test file around the mentioned lines
sed -n '130,200p' provider/mem/mem_test.goRepository: prometheus/alertmanager
Length of output: 2087
Remove the explicit alerts.Close() call at line 188; the defer at line 146 is sufficient.
The explicit alerts.Close() in the case <-done: branch is redundant since defer alerts.Close() already executes when the function returns. The Close() method only calls a.cancel() (a context.CancelFunc), which is idempotent and safe to call multiple times, so there is no panic risk. However, the redundant call should still be removed for clarity.
Proposed fix
select {
case <-done:
// no deadlock
- alerts.Close()
case <-time.After(10 * time.Second):
t.Error("Deadlock detected")
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@provider/mem/mem_test.go` at line 146, Remove the redundant explicit call to
alerts.Close() inside the "case <-done:" branch — the existing defer
alerts.Close() at function start already ensures Close() runs when the test
function returns; locate the "case <-done:" branch in the test and delete the
alerts.Close() invocation (leave the defer alerts.Close() and the done handling
intact).
Which user-facing changes does this PR introduce?
Summary
This PR integrates
go.uber.org/goleakacross all test packages to detect goroutine leaks, and fixes all identifiedleaks.
Changes
goleak.VerifyTestMain(m)to 49 existing test files (one per package)go.uber.org/goleak@v1.3.0as a direct dependencySummary by CodeRabbit