Tracer Version(s)
1.55.0
Java Version(s)
21.0.9
JVM Vendor
Eclipse Adoptium / Temurin
Bug Report
The dd_oome_notifier.sh (and dd_crash_uploader.sh) scripts spawned via -XX:OnOutOfMemoryError do not unset JDK_JAVA_OPTIONS, JAVA_TOOL_OPTIONS, or _JAVA_OPTIONS before launching a child java process. The child JVM therefore inherits the full application JVM configuration, which causes three distinct problems:
- Port conflicts — flags like JMX remote (
-Dcom.sun.management.jmxremote.port=9012) cause BindException because the parent JVM still holds the port
- cgroup OOMKill — memory flags like
-Xms/-Xmx or -XX:MaxRAMPercentage=90 cause the child JVM to compete with the still-alive parent for container memory, potentially triggering a kernel OOMKill
- Lost OOM diagnostics — when the script fails for any of the above reasons, no OOME event reaches Datadog, and the original OOM exception details (stack trace, thread name) are also lost because
-XX:+ExitOnOutOfMemoryError force-terminates after the handler runs
Actual output:
# java.lang.OutOfMemoryError: Metaspace
# -XX:OnOutOfMemoryError="/tmp/datadog/java/dd_oome_notifier.sh %p"
# Executing /bin/sh -c "/tmp/datadog/java/dd_oome_notifier.sh 1"...
Agent Jar: /opt/datadog/apm/library/java/dd-java-agent.jar
Tags: host:order-664fc65797-2bclc,...
JAVA_HOME: /opt/java/openjdk
PID: 1
NOTE: Picked up JDK_JAVA_OPTIONS: -XX:MaxGCPauseMillis=4000 -XX:MinRAMPercentage=25 -XX:MaxRAMPercentage=90 -XX:MaxMetaspaceSize=128m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 -Dcom.sun.management.jmxremote.rmi.port=9012 ...
Picked up JAVA_TOOL_OPTIONS: -javaagent:/opt/datadog/apm/library/java/dd-java-agent.jar ...
Caused by: java.rmi.server.ExportException: Port already in use: 9012; nested exception is:
java.net.BindException: Address already in use
...
Error: Failed to generate OOME event
Terminating due to java.lang.OutOfMemoryError: Metaspace
Expected Behavior
The OOME event should be sent to Datadog successfully, regardless of what JDK_JAVA_OPTIONS or JAVA_TOOL_OPTIONS contain. When the script fails, the original OOM exception details (stack trace, thread name) should still be visible in the application logs.
Reproduction Code
-
Configure a JVM application with JMX remote monitoring via JDK_JAVA_OPTIONS:
JDK_JAVA_OPTIONS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 -Dcom.sun.management.jmxremote.rmi.port=9012 ...
-
Have dd-java-agent injected (e.g. via Admission Controller), which sets:
JAVA_TOOL_OPTIONS=-javaagent:/opt/datadog/apm/library/java/dd-java-agent.jar -XX:OnOutOfMemoryError="/tmp/datadog/java/dd_oome_notifier.sh %p"
-
Trigger an OutOfMemoryError (in our case: java.lang.OutOfMemoryError: Metaspace)
-
The JVM invokes dd_oome_notifier.sh, which spawns a child java process:
"$config_java_home/bin/java" -Ddd.dogstatsd.start-delay=0 -jar "$config_agent" sendOomeEvent "$config_tags"
-
This child process inherits JDK_JAVA_OPTIONS (including JMX port flags) and JAVA_TOOL_OPTIONS (including the agent jar) from the environment.
-
The child JVM tries to bind JMX to port 9012, which is still held by the dying parent JVM → BindException: Address already in use → "Error: Failed to generate OOME event"
Tracer Version(s)
1.55.0
Java Version(s)
21.0.9
JVM Vendor
Eclipse Adoptium / Temurin
Bug Report
The
dd_oome_notifier.sh(anddd_crash_uploader.sh) scripts spawned via-XX:OnOutOfMemoryErrordo not unsetJDK_JAVA_OPTIONS,JAVA_TOOL_OPTIONS, or_JAVA_OPTIONSbefore launching a childjavaprocess. The child JVM therefore inherits the full application JVM configuration, which causes three distinct problems:-Dcom.sun.management.jmxremote.port=9012) causeBindExceptionbecause the parent JVM still holds the port-Xms/-Xmxor-XX:MaxRAMPercentage=90cause the child JVM to compete with the still-alive parent for container memory, potentially triggering a kernel OOMKill-XX:+ExitOnOutOfMemoryErrorforce-terminates after the handler runsActual output:
Expected Behavior
The OOME event should be sent to Datadog successfully, regardless of what
JDK_JAVA_OPTIONSorJAVA_TOOL_OPTIONScontain. When the script fails, the original OOM exception details (stack trace, thread name) should still be visible in the application logs.Reproduction Code
Configure a JVM application with JMX remote monitoring via
JDK_JAVA_OPTIONS:Have dd-java-agent injected (e.g. via Admission Controller), which sets:
Trigger an
OutOfMemoryError(in our case:java.lang.OutOfMemoryError: Metaspace)The JVM invokes
dd_oome_notifier.sh, which spawns a child java process:This child process inherits
JDK_JAVA_OPTIONS(including JMX port flags) andJAVA_TOOL_OPTIONS(including the agent jar) from the environment.The child JVM tries to bind JMX to port 9012, which is still held by the dying parent JVM →
BindException: Address already in use→ "Error: Failed to generate OOME event"