Skip to content

instrumentDurableObjectWithSentry breaks Durable Objects that use the WebSocket Hibernation API #21153

@nwilliams-lucas

Description

@nwilliams-lucas

Summary

Wrapping a Durable Object class with instrumentDurableObjectWithSentry from @sentry/cloudflare breaks the WebSocket Hibernation API. Every WS connection drops after 2-15 seconds with Connection reset without closing handshake, then subsequent reconnects fail with HTTP error: 500 Internal Server Error from the worker. Removing the wrap restores normal behavior within seconds.

I bisected this in production over two days — happy to share more detail / minimal repro if useful.

Versions

  • @sentry/cloudflare: 10.53.1
  • wrangler: 4.94.0
  • Node 22 / pnpm 10
  • Cloudflare Workers, custom domain, WS Hibernation API (this.ctx.acceptWebSocket(server, [tag]) + webSocketMessage handler on the DO class)

Minimal change that reproduces

Starting from a healthy worker with a working DO using WS Hibernation:

// lib/sentry.ts
//
// Remove `honoIntegration` (would otherwise double-register Hono prototype
// patches when both Sentry.withSentry and instrumentDurableObjectWithSentry
// call buildSentryOptions — known pre-existing issue).
- import { honoIntegration } from "@sentry/cloudflare";
  ...
- integrations: [honoIntegration()],
// tenant-coordinator.ts
+ import { instrumentDurableObjectWithSentry } from "@sentry/cloudflare";
+ import { buildSentryOptions } from "./lib/sentry";

- export class TenantCoordinator extends DurableObject<ServerBindings> {
+ class TenantCoordinatorBase extends DurableObject<ServerBindings> {
    // ... DO body, including
    //     this.ctx.acceptWebSocket(server, [deviceId])
    //     webSocketMessage(ws: WebSocket, message) { ... }
}

+ export const TenantCoordinator = instrumentDurableObjectWithSentry(
+   (env: ServerBindings) => buildSentryOptions(env),
+   TenantCoordinatorBase,
+ );

Worker entrypoint uses Sentry.withSentry((env) => buildSentryOptions(env), handler).

Symptom

Within ~5 minutes after the deploy completes, the agent's journal fills with:

WARN marquee_agent::state: reconnect failed
    error=websocket connect error: HTTP error: 500 Internal Server Error
INFO marquee_agent::state: reconnecting after backoff
INFO marquee_agent::websocket: connecting to WebSocket
WARN marquee_agent::state: reconnect failed
    error=websocket connect error: HTTP error: 500 Internal Server Error

Per-minute count of Connection reset / HTTP 500 events stabilizes at 2-5/min indefinitely. One reproduction: 54 such events in the first 5 minutes after the test deploy completed.

Reverting the wrap (re-exporting the base class directly) brings the count back to 0 within 3 minutes.

What I ruled out via bisect

  1. Not the deploy pipeline. Two wrangler deploy --dry-run --outdir X builds yield byte-identical index.js (SHA-256 match). Deployed upload size is identical with or without --outdir.
  2. Not @sentry/hono. Shipped @sentry/hono/cloudflare middleware on two sibling workers (no DOs): zero WS impact over 10+ minutes. Middleware itself is fine.
  3. Not Sentry.withSentry worker-level init. Live for a week prior without issue.
  4. Not the recursion bug. buildSentryOptions does not include honoIntegration in this test — removed as prerequisite.

The instrumentDurableObjectWithSentry wrap alone is sufficient to reproduce, every time.

Hypothesis

instrumentDurableObjectWithSentry patches methods on the DO class. WS Hibernation relies on the runtime being able to identify and serialize/restore handlers like webSocketMessage, webSocketClose, webSocketError. If the patched methods change prototype chain or method identity, the runtime can't restore the WS attachment after eviction, and new WS upgrades 500.

Bytes line up: identical index.js deployed; divergence is purely in how the runtime treats the wrapped class at invocation.

What would help

  1. Document the WS-Hibernation incompatibility on instrumentDurableObjectWithSentry.
  2. Long-term: ensure the wrapper preserves whatever method-identity contract the runtime uses for hibernation (leave lifecycle methods like webSocketMessage unwrapped, or wrap inside method body).
  3. Workaround until then: use Sentry.captureException in try/catch inside DO method bodies instead of wrapping the class.

Happy to share a fuller repro or marquee-server diff that caused this.

Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions