Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ node_modules/
dist/
out/
build/
.next/
next-env.d.ts
*.tsbuildinfo

# Environment files
Expand Down
227 changes: 227 additions & 0 deletions sentry-javascript/19367/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# Reproduction for sentry-javascript#19367

**Issue:** https://github.com/getsentry/sentry-javascript/issues/19367

## Description

Next.js 16 Turbopack bundles `@opentelemetry/api` into **separate chunks** for the
instrumentation hook (Sentry SDK init) and each route handler. This creates multiple
module instances that share the global context manager via `Symbol.for("opentelemetry.js.api.1")`,
but have separate `ContextAPI` singletons and module closures.

In `@sentry/nextjs` 10.38.0, `_startSpan()` nests 3 `context.with()` calls:
```
context.with(suppressedCtx) <- copy A's ContextAPI
-> tracer.startActiveSpan()
-> api.context.with(ctxWithSpan) <- copy B's ContextAPI
-> context.with(activeCtx) <- copy A's ContextAPI
-> callback
```

With duplicated module instances, the `SentryContextManager`'s scope cloning and
re-entry through the async context strategy can spiral into infinite recursion,
causing `RangeError: Maximum call stack size exceeded`.

This did not happen with `@sentry/nextjs` 10.8.0 because `startSpan()` only had
**1** `context.with()` call (inside `tracer.startActiveSpan()`), keeping the
re-entry depth bounded.

## Steps to Reproduce

1. Install dependencies:
```bash
npm install
```

2. Build with Turbopack (the default for Next.js 16):
```bash
npm run build
```

3. **Verify the duplication in build output:**
```bash
npm run check-otel-dedup
```
This analyzes which chunks each route and the instrumentation hook load,
confirming they get **different** copies of `@opentelemetry/api`.

4. Start the production server:
```bash
npm start
```

5. Test the diagnostic endpoint:
```bash
curl http://localhost:3000/api/otel-check | python3 -m json.tool
```

6. Test the realistic route with nested Sentry spans:
```bash
curl http://localhost:3000/api/test | python3 -m json.tool
```

7. Load test to attempt triggering the crash (intermittent):
```bash
for i in $(seq 1 500); do curl -s http://localhost:3000/api/test > /dev/null & done; wait
```

## What the check script shows

`npm run check-otel-dedup` maps the exact chunk loading for each entry point:

```
=== @opentelemetry/api duplication analysis ===

Instrumentation OTel chunks:
[root-of-the-server]__963c8309._.js (8KB) -- Symbol.for: true, ContextAPI: false
_0066dbbb._.js (4684KB) -- Symbol.for: false, ContextAPI: true

/api/test OTel chunks:
[root-of-the-server]__3f199e61._.js (9KB) -- Symbol.for: true, ContextAPI: false
[root-of-the-server]__8f25289d._.js (161KB) -- Symbol.for: false, ContextAPI: true

*** DUPLICATION DETECTED for /api/test ***
Instrumentation-only OTel chunks: __963c8309, _0066dbbb
Route-only OTel chunks: __3f199e61, __8f25289d
-> Different module instances with separate ContextAPI singletons
```

## Routes

| Route | Purpose |
|---|---|
| `/api/otel-check` | Diagnostic: tests `context.with()` (single + 3-deep nested), reports global OTel registry state |
| `/api/test` | Realistic: nested `Sentry.startSpan()` calls simulating DB queries + external API calls |

## Investigation Findings

### Structural Evidence (Confirmed)

After `npm run build`, the `check-otel-dedup` script confirms that the instrumentation
hook and each route handler load **different chunks** containing `@opentelemetry/api`.
Every route gets its own copy. The instrumentation hook and route handlers never share
OTel chunks.

**Two ContextAPI class definitions in the same chunk:** The route handler chunk contains
two separate `ContextAPI` class definitions -- one Turbopack ESM-style (used by route
handler code) and one CJS-style (bundled from `@prisma/instrumentation`'s nested
`@opentelemetry/instrumentation` dependency). Both call `_getContextManager()` which
reads from `globalThis[Symbol.for("opentelemetry.js.api.1")]`, so they share the same
context manager but are separate JavaScript objects with separate closures, singletons,
and prototype chains.

**Version-independent:** The duplication occurs identically with both `@sentry/nextjs`
10.8.0 and 10.38.0 (8 chunks with OTel `Symbol.for`, 2 with `ContextAPI` in both).

**`prismaIntegration()` makes no difference:** Removing `Sentry.prismaIntegration()`
produces identical chunk structure. The `@prisma/instrumentation` package is bundled
because it is a direct dependency of `@sentry/node`, not because of the integration call.

### Runtime Behavior (Crash Not Reproduced)

| Test | Result |
|---|---|
| Single `context.with()` call | Passes |
| 3-deep nested `context.with()` (mimicking `_startSpan`) | Passes |
| Alternating Copy A / Copy B `context.with()` | Passes |
| 1000 concurrent mixed A/B `context.with()` calls | Passes |
| `Sentry.startSpan()` with nested spans + Copy B `context.with()` | Passes |
| 500 concurrent HTTP requests to `/api/test` | Server survives |
| `--stack-size=128` with concurrent load | Server survives |
| Monkey-patching instrumentation's `ContextAPI.with()` | Never triggered by routes (confirming separate singletons) |

**Why the crash doesn't reproduce locally:** The `SentryContextManager.with()` calls
`super.with()` which ends at `asyncLocalStorage.run(ctx, fn)` -- it does **not** call
back to any `ContextAPI.with()`. So there is no direct recursive loop between the two
ContextAPI singletons.

**Remaining hypotheses for the crash:**
- Node.js v24 runtime differences (different ALS behavior or stack limits)
- Actual Prisma DB queries creating deeper instrumentation chains
- pnpm workspace resolution causing different module nesting
- `next-intl` middleware adding wrapping to the Next.js config
- Sustained production traffic ("minutes to hours" under load)
- Duplicate copies of `require-in-the-middle`/`import-in-the-middle` both patching
the same modules, creating re-entrant wrapping chains

### Turbopack's Behavior Is By Design

Per-route self-contained bundles are intentional. A Turbopack contributor (lukesandberg)
confirmed on [vercel/next.js#89192](https://github.com/vercel/next.js/issues/89192):

> "Bundling the module into multiple different output chunks is expected. The different
> routes may end up in different lambdas in production and so need to be self-contained."

However, the runtime deduplication layer that is supposed to compensate is not working
correctly for `@opentelemetry/api`. The Turbopack runtime uses a shared `moduleCache`
but the build output shows separate module definitions with different module IDs across
chunks.

**Known gaps:**
- [vercel/next.js#89192](https://github.com/vercel/next.js/issues/89192) -- Duplicate
class definitions across route chunks, breaking `instanceof` (same root cause)
- [vercel/next.js#89252](https://github.com/vercel/next.js/issues/89252) -- Related
chunking issue where unused CSS is included across routes
- Turbopack status page lists "Production Optimized JS Chunking" as still in progress
- Webpack does not have this problem (SplitChunksPlugin deduplicates shared deps)

### The `serverExternalPackages` Workaround

Adding `@opentelemetry/api` to `serverExternalPackages` in `next.config.js` forces
Node.js native `require()` resolution at runtime, bypassing Turbopack's bundling:

```js
const nextConfig = {
serverExternalPackages: ["@opentelemetry/api"],
};
```

This guarantees a single module instance because Node.js `require()` caches modules.
`@opentelemetry/api` is **not** on Next.js's built-in auto-externalized packages list,
though `@sentry/profiling-node` and `import-in-the-middle` are. Adding it upstream
would be a reasonable fix.

### OTel's Singleton Design vs Bundler Reality

`@opentelemetry/api` uses `Symbol.for("opentelemetry.js.api.1")` to store a global
registry on `globalThis`. This design was intended for the npm/yarn flat `node_modules`
world where multiple _versions_ might coexist. It was not designed for bundler-created
duplicate instances of the _same version_ in the same process.

**What the `Symbol.for` pattern protects:** global context manager registration,
global tracer provider registration, version compatibility checks.

**What it does NOT protect:** class identity (`instanceof` fails across copies),
module closures (each copy has its own), singleton instances (`ContextAPI.getInstance()`
returns a different object per copy), instrumentation registration (two copies may
both try to hook into the same Node.js modules).

## Conclusions

**Confirmed:**
1. Turbopack creates separate `@opentelemetry/api` module instances -- reproducible and deterministic
2. This is a known Turbopack limitation -- runtime deduplication does not work correctly here
3. Webpack does not have this problem
4. The duplication is SDK-version-independent (10.8.0 and 10.38.0 identical)
5. `prismaIntegration()` is irrelevant -- duplication comes from `@sentry/node`'s bundled `@prisma/instrumentation`

**Not confirmed:**
1. The actual infinite recursion mechanism -- could not trigger `RangeError` in any test
2. Why 10.38.0 crashes but 10.8.0 doesn't -- structural duplication is identical
3. The exact conditions -- likely requires Node.js v24, real Prisma queries, pnpm, or sustained production traffic

**Recommended actions:**

| Action | Owner | Priority |
|---|---|---|
| Add `@opentelemetry/api` to `serverExternalPackages` | Users (workaround) | Immediate |
| Add `@opentelemetry/api` to Next.js built-in externals list | Next.js team | Short-term |
| Fix Turbopack runtime module deduplication | Turbopack team | Medium-term |
| Track [vercel/next.js#89192](https://github.com/vercel/next.js/issues/89192) | All | Ongoing |

## Environment

- `@sentry/nextjs`: 10.38.0
- `next`: 16.1.6 (Turbopack)
- `@prisma/instrumentation`: ^7.4.0
- Node.js: v20+ (reporter uses v24)
91 changes: 91 additions & 0 deletions sentry-javascript/19367/app/api/otel-check/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import { NextResponse } from "next/server";
import { context, trace } from "@opentelemetry/api";

/**
* Diagnostic route that proves @opentelemetry/api is duplicated at runtime.
*
* When Turbopack bundles this route, it gets its OWN copy of @opentelemetry/api
* (separate module instance from the one Sentry loaded in the instrumentation hook).
* Both copies share the same globalThis[Symbol.for("opentelemetry.js.api.1")]
* registry, but they are different JavaScript objects with different closures.
*
* This is the root cause of sentry-javascript#19367: the duplicated ContextAPI
* creates a re-entrant loop through the shared global context manager.
*/
export async function GET() {
const diagnostics: Record<string, unknown> = {};

// 1. Check the global OTel registry (shared across all copies via Symbol.for)
const globalKey = Symbol.for("opentelemetry.js.api.1");
const globalRegistry = (globalThis as any)[globalKey];
diagnostics.globalOtelRegistry = {
exists: !!globalRegistry,
version: globalRegistry?.version,
hasContextManager: !!globalRegistry?.context,
hasTracerProvider: !!globalRegistry?.trace,
hasDiag: !!globalRegistry?.diag,
};

// 2. Check the context manager registered by Sentry
const ctxManager = globalRegistry?.context;
if (ctxManager) {
diagnostics.contextManager = {
constructor: ctxManager.constructor.name,
hasWithMethod: typeof ctxManager.with === "function",
};
}

// 3. Test context.with() — the method that causes the infinite recursion
let contextWithResult: string;
try {
const result = context.with(context.active(), () => {
return "context.with() succeeded (single call)";
});
contextWithResult = result;
} catch (err: any) {
contextWithResult = `CRASH: ${err.message}`;
}
diagnostics.singleContextWith = contextWithResult;

// 4. Test nested context.with() — mimics what _startSpan() does in SDK 10.38.0
//
// In 10.38.0, _startSpan() wraps the call like:
// context.with(suppressedCtx, () => // call 1
// tracer.startActiveSpan(name, ctx, span => // call 2 (inside startActiveSpan)
// context.with(activeCtx, () => // call 3
// callback(span)
// )
// )
// )
//
// With duplicated modules, each context.with() may go through a different
// ContextAPI singleton, and the Sentry context manager's scope cloning
// triggers re-entry through the async context strategy.
let nestedResult: string;
try {
context.with(context.active(), () => {
return context.with(context.active(), () => {
const tracer = trace.getTracer("repro-test");
return tracer.startActiveSpan("nested-test", (span) => {
span.end();
return "nested context.with() succeeded (3-deep, same as _startSpan in 10.38.0)";
});
});
});
nestedResult = "3-deep nested context.with() succeeded";
} catch (err: any) {
nestedResult = `CRASH: ${err.message}`;
}
diagnostics.nestedContextWith = nestedResult;

// 5. The build-time evidence
diagnostics.buildTimeEvidence =
"Run `npm run check-otel-dedup` on the build output to confirm that " +
"the instrumentation hook and route handlers load DIFFERENT chunks " +
"containing @opentelemetry/api. Turbopack creates separate module " +
"instances instead of deduplicating into a single shared chunk.";

return NextResponse.json(diagnostics, {
status: contextWithResult.startsWith("CRASH") ? 500 : 200,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status code ignores the nested crash test result

Medium Severity

The HTTP status code is determined solely by contextWithResult (the single context.with() test), but ignores nestedResult (the 3-deep nested test). The nested test is the one that mimics _startSpan()'s pattern from SDK 10.38.0 — the exact scenario that triggers the infinite recursion crash. If the simple test passes but the nested test crashes, the endpoint returns 200 instead of 500, making automated detection of the bug's primary failure mode impossible.

Additional Locations (1)

Fix in Cursor Fix in Web

});
}
60 changes: 60 additions & 0 deletions sentry-javascript/19367/app/api/test/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import { NextResponse } from "next/server";
import * as Sentry from "@sentry/nextjs";

/**
* Simulates a realistic API route that creates nested Sentry spans.
*
* In production, auto-instrumentation (HTTP, Prisma, etc.) creates spans
* around every request and DB query. Each Sentry.startSpan() call in 10.38.0
* internally calls context.with() THREE times (suppressed ctx → startActiveSpan
* → active ctx). With Turbopack's duplicate @opentelemetry/api modules, this
* 3-deep nesting × N nested spans can spiral into infinite recursion.
*
* The crash is intermittent because it depends on Node.js event loop timing
* and which module copy's ContextAPI handles each context.with() call.
*/
export async function GET() {
// Outer span — simulates the HTTP instrumentation auto-span
return Sentry.startSpan({ name: "GET /api/test" }, async () => {
// Inner span — simulates a DB query (e.g., Prisma)
const dbResult = await Sentry.startSpan(
{ name: "prisma:query SELECT" },
async () => {
await new Promise((resolve) => setTimeout(resolve, 2));
return { users: 42 };
}
);

// Another inner span — simulates a second DB query
const cacheResult = await Sentry.startSpan(
{ name: "redis:get session" },
async () => {
await new Promise((resolve) => setTimeout(resolve, 1));
return { cached: true };
}
);

// Deeply nested span — simulates calling an external API
const apiResult = await Sentry.startSpan(
{ name: "http:POST /external-api" },
async () => {
// Nested span inside the external call
return Sentry.startSpan(
{ name: "serialize:response" },
async () => {
await new Promise((resolve) => setTimeout(resolve, 1));
return { ok: true };
}
);
}
);

return NextResponse.json({
status: "ok",
dbResult,
cacheResult,
apiResult,
timestamp: Date.now(),
});
});
}
Loading