Skip to content

JAVA-6155 add thread dumps to evergreen logs on failed test cases#1932

Open
strogiyotec wants to merge 2 commits intomongodb:mainfrom
strogiyotec:JAVA-6155
Open

JAVA-6155 add thread dumps to evergreen logs on failed test cases#1932
strogiyotec wants to merge 2 commits intomongodb:mainfrom
strogiyotec:JAVA-6155

Conversation

@strogiyotec
Copy link
Copy Markdown
Contributor

@strogiyotec strogiyotec commented Apr 1, 2026

JAVA-6155
This PR adds thread dumps whenever test fails
The motivation behind this PR was a failing test case poolClearedExceptionMustBeRetryable
Claude identified a deadlock that I wasn't able to reproduce locally
With this PR develop could easily see thread dumps of all the threads directly in evergreen and locally

Here is an example of a patch that deliberately makes a test case fail and it's output

@strogiyotec strogiyotec requested a review from a team as a code owner April 1, 2026 18:21
@strogiyotec strogiyotec requested review from Copilot and nhachicha and removed request for nhachicha April 1, 2026 18:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a JUnit 5 TestWatcher extension that emits a full JVM thread dump to the test logs whenever a test fails, making deadlocks and hung-thread scenarios easier to diagnose in Evergreen and locally.

Changes:

  • Add ThreadDumpOnFailureExtension (JUnit Jupiter TestWatcher) that logs a thread dump on test failure.
  • Register the extension via META-INF/services/... and enable JUnit Jupiter extension autodetection via junit-platform.properties.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
driver-core/src/test/unit/com/mongodb/internal/diagnostics/ThreadDumpOnFailureExtension.java Adds the failure hook and thread-dump collection logic.
driver-core/src/test/resources/META-INF/services/org.junit.jupiter.api.extension.Extension Registers the extension for ServiceLoader-based autodetection.
driver-core/src/test/resources/junit-platform.properties Enables JUnit Jupiter extension autodetection so the extension is picked up automatically.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

public void testFailed(final ExtensionContext context, final Throwable cause) {
String testName = context.getDisplayName();
String threadDump = getAllThreadsDump();
LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testFailed ignores the provided cause, so the log entry won’t include the exception/stack trace that triggered the failure. Logging the throwable (e.g., using the error(String, Throwable) overload) would keep the failure reason adjacent to the thread dump in Evergreen/local logs.

Suggested change
LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump);
LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump, cause);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@strogiyotec strogiyotec Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cause will already be provided on assertion fail I don't think duplicating it for thread dumps makes sense

Comment on lines +39 to +50
public void testFailed(final ExtensionContext context, final Throwable cause) {
String testName = context.getDisplayName();
String threadDump = getAllThreadsDump();
LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump);
}

private static String getAllThreadsDump() {
final ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean();
ThreadInfo[] threadInfos = threadMXBean.dumpAllThreads(
threadMXBean.isObjectMonitorUsageSupported(),
threadMXBean.isSynchronizerUsageSupported()
);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAllThreadsDump() (via ThreadMXBean.dumpAllThreads) can throw at runtime (e.g., SecurityException). If this happens inside testFailed, the extension itself may error and you’ll lose the diagnostic output. Consider wrapping thread-dump generation in a try/catch and logging a clear fallback message so the original test failure reporting isn’t impacted.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, wrapped it with try catch

*/
public final class ThreadDumpOnFailureExtension implements TestWatcher {

private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getName());
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger naming is inconsistent with other driver-core tests: most use a short suffix like ClassName.class.getSimpleName() (e.g., ConnectionPoolTest / ConnectionPoolAsyncTest). Passing class.getName() here yields a very long logger name (org.mongodb.driver.com.mongodb...) that’s harder to filter/read in logs; consider switching to getSimpleName() or a conventional test logger suffix.

Suggested change
private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getName());
private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getSimpleName());

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced to use simple name

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +19 to +30
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;


import com.mongodb.internal.diagnostics.logging.Logger;
import com.mongodb.internal.diagnostics.logging.Loggers;

import org.junit.jupiter.api.extension.ExtensionContext;
import org.junit.jupiter.api.extension.TestWatcher;


Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import order/grouping here is inconsistent with the repo’s established convention (typically project com.mongodb.* imports first, then org.*, then java.*, e.g. driver-core/src/test/unit/com/mongodb/internal/connection/ConnectionPoolTest.java:19-28). Reorder the imports to match the surrounding codebase style to keep diffs consistent and reduce merge noise.

Suggested change
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;
import com.mongodb.internal.diagnostics.logging.Logger;
import com.mongodb.internal.diagnostics.logging.Loggers;
import org.junit.jupiter.api.extension.ExtensionContext;
import org.junit.jupiter.api.extension.TestWatcher;
import com.mongodb.internal.diagnostics.logging.Logger;
import com.mongodb.internal.diagnostics.logging.Loggers;
import org.junit.jupiter.api.extension.ExtensionContext;
import org.junit.jupiter.api.extension.TestWatcher;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +43
public void testFailed(final ExtensionContext context, final Throwable cause) {
String testName = context.getDisplayName();
String threadDump = getAllThreadsDump();
LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logs a full JVM thread dump on every failed test. When failures cascade (e.g., environment outage causing many tests to fail), this can produce very large Evergreen logs and make the original failures harder to find. Consider throttling (e.g., only dump once per JVM via an AtomicBoolean, or once per test class) and/or adding a clear header/footer delimiter so each dump is easy to locate in the log stream.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1 @@
junit.jupiter.extensions.autodetection.enabled=true
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

junit-platform.properties is included in driver-core’s testArtifacts JAR (see buildSrc/src/main/kotlin/conventions/test-artifacts.gradle.kts:26-29), and several other modules depend on :driver-core:testArtifacts for their tests (e.g., driver-sync/build.gradle.kts:37-39). That means this setting may unintentionally enable JUnit Jupiter extension autodetection beyond driver-core depending on classpath resource resolution. If the intent is to scope this behavior to a specific module/test task, consider enabling it via the Gradle Test task/system property instead, or placing a shared properties file in a deliberately shared test-resources location.

Suggested change
junit.jupiter.extensions.autodetection.enabled=true
# NOTE:
# This file is included in driver-core's testArtifacts JAR and is therefore
# visible on the test classpath of other modules that depend on :driver-core:testArtifacts.
# Enabling JUnit Jupiter extension autodetection here would unintentionally
# turn it on for those modules as well.
#
# If you need autodetection, configure it in the specific module or Gradle
# Test task instead (e.g. via system property junit.jupiter.extensions.autodetection.enabled=true),
# or in a module-local junit-platform.properties that is not shared via testArtifacts.
#
# Example (do NOT uncomment here unless you really want it shared globally):
# junit.jupiter.extensions.autodetection.enabled=true

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants