docs: add comprehensive troubleshooting section to README#4711
docs: add comprehensive troubleshooting section to README#4711ABHISHEK-DBZ wants to merge 4 commits intoprometheus:mainfrom
Conversation
- Add troubleshooting section with common issues and solutions - Include cluster connectivity problems and DNS resolution timeouts - Add guidance for alerts/notifications not working - Include memory usage and configuration reload issues - Provide practical examples and commands for debugging This helps users quickly resolve common operational issues without needing to search through multiple documentation sources. Signed-off-by: abhishek-dbz <abhibro936@gmail.com>
In 92ecf8b silence_bench_test.go was left behind since it's not run automatically, and started failing. Fix by passing a new registry when creating Silences. Signed-off-by: Guido Trotter <guido@hudson-trading.com> Co-authored-by: Guido Trotter <guido@hudson-trading.com> Signed-off-by: abhishek-dbz <abhibro936@gmail.com>
5f4d4ab to
4fbc391
Compare
ultrotter
left a comment
There was a problem hiding this comment.
Thanks, that's useful! It might be worth considering also adding information about what metrics to put in a dashboard or monitoring about alertmanager itself.
| **Solutions:** | ||
| - Check for alert storms - large number of unique alert groups | ||
| - Review `group_by` labels in routing configuration | ||
| - Consider using more specific grouping to reduce alert group count |
There was a problem hiding this comment.
Would this better read "broader", since it sounds like if you go for more specific, you'll get more groups, not fewer?
|
|
||
| **Solutions:** | ||
| - Check for alert storms - large number of unique alert groups | ||
| - Review `group_by` labels in routing configuration |
There was a problem hiding this comment.
We can possibly remove this line which doesn't specify how to review them, and merge them with the one below
|
I'd suggest this move to the |
|
This pull request is stale because it has been open 30 days with no activity. |
|
@ABHISHEK-DBZ do you want to continue working on this PR? |
|
Hi @ABHISHEK-DBZ , we have not heard back from you here in a while, so I'll go ahead and close this PR. Kind regards |
This helps users quickly resolve common operational issues without needing to search through multiple documentation sources.