Skip to content

Fix snapshot dump crash on stale jmethodID (PROF-14003)#460

Open
jbachorik wants to merge 2 commits intomainfrom
jb/prof-14003-snapshot-dump-crash-fix
Open

Fix snapshot dump crash on stale jmethodID (PROF-14003)#460
jbachorik wants to merge 2 commits intomainfrom
jb/prof-14003-snapshot-dump-crash-fix

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented Apr 10, 2026

What does this PR do?:

Fixes a crash in Lookup::fillJavaMethodInfo during snapshot dump when profiled jmethodIDs refer to classes that were subsequently unloaded. Two mitigations close the TOCTOU window:

  1. check_jmethodID_hotspot (vmStructs.cpp): Before reading class_loader_data from the cpool_holder Klass pointer, verify the full memory range is still readable via SafeAccess::isReadableRange. This catches freed/reclaimed Klass pages before JVMTI is called.

  2. fillJavaMethodInfo (flightRecorder.cpp): Add method_class != NULL guard after GetMethodDeclaringClass succeeds, preventing GetClassSignature from being called with a null JNI handle.

Motivation:

On JDK 1.8.0_472, profiling captures jmethodIDs in async signal handlers. By dump time, associated classes may have been unloaded. GetMethodDeclaringClass can return a jclass wrapping a stale/garbage oop, and the existing J9 sentinel guard (method_class != (jclass)-1) is bypassed on HotSpot (!VM::isOpenJ9() is always true). This causes GetClassSignature to crash inside oopDesc::is_a().

Crash trace: #0 oopDesc::is_a() → #1 jvmti_GetClassSignature → #2 Lookup::fillJavaMethodInfo

Additional Notes:

This is a mitigation, not a complete fix. The TOCTOU window between check_jmethodID and the JVMTI calls is narrowed but not eliminated. A complete fix would require a ClassUnload listener to eagerly invalidate cached jmethodIDs, which is a larger change. The isReadableRange guard covers the exact memory range accessed at cpool_holder + class_loader_data_offset.

How to test the change?:

Reproducing the race reliably requires running under a class-loading-heavy workload on JDK 8 with frequent GC cycles during dump. The fix is a defensive guard; unit-level testing is not feasible without a test-only API that forces jmethodID staleness. CI should verify no regression in existing profiling tests.

For Datadog employees:

  • This PR doesn't touch any of that.
  • JIRA: PROF-14003

jbachorik and others added 2 commits April 10, 2026 11:39
DTLS initialisation for shared libraries calls calloc internally.
If a profiler signal fires on a thread whose TLS block has not been
set up yet while that thread is inside malloc, any thread_local
access in the signal-handler path deadlocks on the allocator lock —
manifesting as a crash in findLibraryByAddress.

- Add findLibraryImpl.h: signal-handler-safe template using a plain
  static volatile int last-hit index (no thread_local, no TLS access)
- Delegate Libraries::findLibraryByAddress to the template
- Add libraries_ut.cpp: unit tests for known/null/invalid/cache paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On JDK 8, jmethodIDs captured during profiling can refer to classes
that are subsequently unloaded before the snapshot dump. The TOCTOU
window between check_jmethodID() and the JVMTI calls could result in
GetMethodDeclaringClass returning a jclass wrapping a garbage oop,
causing GetClassSignature to crash in oopDesc::is_a().

Two mitigations:
- In check_jmethodID_hotspot: add SafeAccess::isReadableRange() guard
  on the Klass before reading its class_loader_data field, catching
  freed/reclaimed Klass pages before returning true.
- In fillJavaMethodInfo: add method_class != NULL guard after
  GetMethodDeclaringClass to prevent calling GetClassSignature with a
  null handle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik added the AI label Apr 10, 2026
@dd-octo-sts
Copy link
Copy Markdown

dd-octo-sts bot commented Apr 10, 2026

CI Test Results

Run: #24238697713 | Commit: b5cbf99 | Duration: 22m 36s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-04-10 10:58:23 UTC

@jbachorik jbachorik marked this pull request as ready for review April 10, 2026 11:01
@jbachorik jbachorik requested a review from a team as a code owner April 10, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant