JAVA-6155 add thread dumps to evergreen logs on failed test cases#1932
JAVA-6155 add thread dumps to evergreen logs on failed test cases#1932strogiyotec wants to merge 2 commits intomongodb:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a JUnit 5 TestWatcher extension that emits a full JVM thread dump to the test logs whenever a test fails, making deadlocks and hung-thread scenarios easier to diagnose in Evergreen and locally.
Changes:
- Add
ThreadDumpOnFailureExtension(JUnit JupiterTestWatcher) that logs a thread dump on test failure. - Register the extension via
META-INF/services/...and enable JUnit Jupiter extension autodetection viajunit-platform.properties.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| driver-core/src/test/unit/com/mongodb/internal/diagnostics/ThreadDumpOnFailureExtension.java | Adds the failure hook and thread-dump collection logic. |
| driver-core/src/test/resources/META-INF/services/org.junit.jupiter.api.extension.Extension | Registers the extension for ServiceLoader-based autodetection. |
| driver-core/src/test/resources/junit-platform.properties | Enables JUnit Jupiter extension autodetection so the extension is picked up automatically. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public void testFailed(final ExtensionContext context, final Throwable cause) { | ||
| String testName = context.getDisplayName(); | ||
| String threadDump = getAllThreadsDump(); | ||
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); |
There was a problem hiding this comment.
testFailed ignores the provided cause, so the log entry won’t include the exception/stack trace that triggered the failure. Logging the throwable (e.g., using the error(String, Throwable) overload) would keep the failure reason adjacent to the thread dump in Evergreen/local logs.
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); | |
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump, cause); |
There was a problem hiding this comment.
the cause will already be provided on assertion fail I don't think duplicating it for thread dumps makes sense
| public void testFailed(final ExtensionContext context, final Throwable cause) { | ||
| String testName = context.getDisplayName(); | ||
| String threadDump = getAllThreadsDump(); | ||
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); | ||
| } | ||
|
|
||
| private static String getAllThreadsDump() { | ||
| final ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean(); | ||
| ThreadInfo[] threadInfos = threadMXBean.dumpAllThreads( | ||
| threadMXBean.isObjectMonitorUsageSupported(), | ||
| threadMXBean.isSynchronizerUsageSupported() | ||
| ); |
There was a problem hiding this comment.
getAllThreadsDump() (via ThreadMXBean.dumpAllThreads) can throw at runtime (e.g., SecurityException). If this happens inside testFailed, the extension itself may error and you’ll lose the diagnostic output. Consider wrapping thread-dump generation in a try/catch and logging a clear fallback message so the original test failure reporting isn’t impacted.
There was a problem hiding this comment.
good point, wrapped it with try catch
| */ | ||
| public final class ThreadDumpOnFailureExtension implements TestWatcher { | ||
|
|
||
| private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getName()); |
There was a problem hiding this comment.
Logger naming is inconsistent with other driver-core tests: most use a short suffix like ClassName.class.getSimpleName() (e.g., ConnectionPoolTest / ConnectionPoolAsyncTest). Passing class.getName() here yields a very long logger name (org.mongodb.driver.com.mongodb...) that’s harder to filter/read in logs; consider switching to getSimpleName() or a conventional test logger suffix.
| private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getName()); | |
| private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getSimpleName()); |
There was a problem hiding this comment.
replaced to use simple name
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import java.lang.management.ManagementFactory; | ||
| import java.lang.management.ThreadInfo; | ||
| import java.lang.management.ThreadMXBean; | ||
|
|
||
|
|
||
| import com.mongodb.internal.diagnostics.logging.Logger; | ||
| import com.mongodb.internal.diagnostics.logging.Loggers; | ||
|
|
||
| import org.junit.jupiter.api.extension.ExtensionContext; | ||
| import org.junit.jupiter.api.extension.TestWatcher; | ||
|
|
||
|
|
There was a problem hiding this comment.
Import order/grouping here is inconsistent with the repo’s established convention (typically project com.mongodb.* imports first, then org.*, then java.*, e.g. driver-core/src/test/unit/com/mongodb/internal/connection/ConnectionPoolTest.java:19-28). Reorder the imports to match the surrounding codebase style to keep diffs consistent and reduce merge noise.
| import java.lang.management.ManagementFactory; | |
| import java.lang.management.ThreadInfo; | |
| import java.lang.management.ThreadMXBean; | |
| import com.mongodb.internal.diagnostics.logging.Logger; | |
| import com.mongodb.internal.diagnostics.logging.Loggers; | |
| import org.junit.jupiter.api.extension.ExtensionContext; | |
| import org.junit.jupiter.api.extension.TestWatcher; | |
| import com.mongodb.internal.diagnostics.logging.Logger; | |
| import com.mongodb.internal.diagnostics.logging.Loggers; | |
| import org.junit.jupiter.api.extension.ExtensionContext; | |
| import org.junit.jupiter.api.extension.TestWatcher; | |
| import java.lang.management.ManagementFactory; | |
| import java.lang.management.ThreadInfo; | |
| import java.lang.management.ThreadMXBean; |
| public void testFailed(final ExtensionContext context, final Throwable cause) { | ||
| String testName = context.getDisplayName(); | ||
| String threadDump = getAllThreadsDump(); | ||
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); | ||
| } |
There was a problem hiding this comment.
This logs a full JVM thread dump on every failed test. When failures cascade (e.g., environment outage causing many tests to fail), this can produce very large Evergreen logs and make the original failures harder to find. Consider throttling (e.g., only dump once per JVM via an AtomicBoolean, or once per test class) and/or adding a clear header/footer delimiter so each dump is easy to locate in the log stream.
| @@ -0,0 +1 @@ | |||
| junit.jupiter.extensions.autodetection.enabled=true | |||
There was a problem hiding this comment.
junit-platform.properties is included in driver-core’s testArtifacts JAR (see buildSrc/src/main/kotlin/conventions/test-artifacts.gradle.kts:26-29), and several other modules depend on :driver-core:testArtifacts for their tests (e.g., driver-sync/build.gradle.kts:37-39). That means this setting may unintentionally enable JUnit Jupiter extension autodetection beyond driver-core depending on classpath resource resolution. If the intent is to scope this behavior to a specific module/test task, consider enabling it via the Gradle Test task/system property instead, or placing a shared properties file in a deliberately shared test-resources location.
| junit.jupiter.extensions.autodetection.enabled=true | |
| # NOTE: | |
| # This file is included in driver-core's testArtifacts JAR and is therefore | |
| # visible on the test classpath of other modules that depend on :driver-core:testArtifacts. | |
| # Enabling JUnit Jupiter extension autodetection here would unintentionally | |
| # turn it on for those modules as well. | |
| # | |
| # If you need autodetection, configure it in the specific module or Gradle | |
| # Test task instead (e.g. via system property junit.jupiter.extensions.autodetection.enabled=true), | |
| # or in a module-local junit-platform.properties that is not shared via testArtifacts. | |
| # | |
| # Example (do NOT uncomment here unless you really want it shared globally): | |
| # junit.jupiter.extensions.autodetection.enabled=true |
JAVA-6155
This PR adds thread dumps whenever test fails
The motivation behind this PR was a failing test case
poolClearedExceptionMustBeRetryableClaude identified a deadlock that I wasn't able to reproduce locally
With this PR develop could easily see thread dumps of all the threads directly in evergreen and locally
Here is an example of a patch that deliberately makes a test case fail and it's output