Skip to content

fix: surface underlying cause in ExtensionError for telemetry (fixes #321638)#321645

Open
vs-code-engineering[bot] wants to merge 1 commit into
mainfrom
fix/extension-error-diagnostics-321638-101a192080447777
Open

fix: surface underlying cause in ExtensionError for telemetry (fixes #321638)#321645
vs-code-engineering[bot] wants to merge 1 commit into
mainfrom
fix/extension-error-diagnostics-321638-101a192080447777

Conversation

@vs-code-engineering

Copy link
Copy Markdown
Contributor

Summary

The error-telemetry bucket [GitHub.copilot-chat] unhandlederror-Error in extension github.copilot-chat: FAILED to handle event is a diagnosability defect in core VS Code, not a single copilot-chat bug. Every extension event-listener failure routed through _asExtensionEvent is wrapped in an ExtensionError whose constructor does message ?? cause.message — when the constant context label 'FAILED to handle event' is supplied, the real underlying error's message is discarded, and the wrapper's own generic event-dispatch stack stands in for the cause's stack. Because the telemetry pipeline (packErrorForTelemetry) and the server-side bucket fingerprint key on the top-level message/stack, all distinct copilot-chat listener failures collapse into one bucket (94,695 hits / 2,058 users on stable 0.52.0) carrying a constant message and a MessagePortMain.emit stack — the actual error is unrecoverable from telemetry.

The fix restores the cause's message and stack onto the ExtensionError so each real failure is separately bucketed and diagnosable, without swallowing or silencing anything — the error still throws and still reaches telemetry, it simply carries its real identity.

Fixes #321638
Recommended reviewer: @jrieken

Culprit Commit

d162ceb7fecc3476f27d75196e3dd71b06c51bc1"extension events use new ExtensionError so that these errors don't make it into 'normal' error telemetry" (PR #236336, fixes #232914), by @jrieken, 2024-12-17.

This commit introduced the ExtensionError wrapper for the _asExtensionEvent path. Its intent — keep extension event errors out of "normal" core error telemetry and attribute them to the extension — is correct. The unintended side effect is the message ?? cause.message collapse described above: routing these errors to extension-attributed telemetry while replacing the cause's message with a constant label merges every distinct failure into a single fingerprint. The bucket is therefore "pre-existing across versions" rather than caused by a recent copilot-chat change; it spikes on each stable rollout (hence stable-anomaly) because that is when the aggregate volume of all copilot-chat event errors becomes visible.

Note: the shipped build is a shallow/grafted checkout, so git blame could not be run locally; the culprit was identified via GitHub commit history on the affected files.

Code Flow

flowchart TD
    A["Main thread sends RPC to ext host"] --> B["MessagePortMain.emit message event"]
    B --> C["RPCProtocol dispatch to ExtHost API accept handler"]
    C --> D["VS Code Event fires"]
    D --> E["_asExtensionEvent wrapper listener (extHost.api.impl.ts:279-291)"]
    E --> F["copilot-chat listener throws real Error = cause"]
    F --> G["catch: new ExtensionError(ext, cause, 'FAILED to handle event') (extHost.api.impl.ts:285)"]
    G --> H["ExtensionError ctor uses message ?? cause.message: real message discarded, generic stack kept (extensions.ts:509)"]
    H --> I["onUnexpectedExternalError to setUnexpectedErrorHandler (extensionHostMain.ts:121)"]
    I --> J["packErrorForTelemetry reads top-level message and stack only (errorTelemetry.ts:67)"]
    J --> K["Server fingerprint = message + file + version: all failures collapse into ONE bucket"]
Loading

Affected Files

  • src/vs/platform/extensions/common/extensions.ts(changed) ExtensionError constructor: the single place where the cause's message/stack were discarded.
  • src/vs/workbench/api/common/extHost.api.impl.ts — (unchanged) _asExtensionEvent wrapper, the sole ExtensionError construction site (line 285); reporting/attribution works as designed and is intentionally left intact.
  • src/vs/platform/telemetry/common/errorTelemetry.ts — (unchanged) packErrorForTelemetry reads only top-level message/stack, which is why surfacing the cause on those fields fixes bucketing.
  • src/vs/workbench/api/common/extensionHostMain.ts — (unchanged) the instanceof ExtensionError consumer (line 130) reads .extension only, so adopting the cause's stack is attribution-safe.

Repro Steps

  1. Install/enable an extension whose VS Code event listener (registered via the extension API — e.g. window.onDidChangeX, workspace.onDidChangeY, a chat/notebook event) throws synchronously.
  2. Trigger the event so the listener runs and throws.
  3. Observe error telemetry: the reported error is Error in extension <id>: FAILED to handle event with a generic MessagePortMain.emit (node:events) stack — identical for every distinct listener failure.
  4. Confirm the real error message/stack are absent from telemetry: they live only on error.cause, which packErrorForTelemetry does not read, so the bucket cannot distinguish or diagnose them.

How the Fix Works

Chosen approachsrc/vs/platform/extensions/common/extensions.ts (ExtensionError constructor):

  1. Message — replace message ?? cause.message with message && cause?.message ? \${message}: ${cause.message}` : (message ?? cause?.message). When a context label is supplied (the only current call site always supplies 'FAILED to handle event'), the real cause.messageis now **appended** instead of discarded, e.g.Error in extension github.copilot-chat: FAILED to handle event: Cannot read properties of undefined (reading uri)`. Because the server-side bucket fingerprint includes the message, this de-aggregates the catch-all bucket into one bucket per real failure and surfaces the actual error type to maintainers. This is the data-producer principle in action: fix the wrapper that produces the misleading top-level field, not the crash site and not a telemetry consumer.
  2. Stack — when the cause has a stack, adopt it (this.stack = cause.stack). The cause's stack contains the real copilot-chat frames (deminifiable to source), so the telemetry call site points at the true failure origin instead of the generic RPC-dispatch frames.

Nothing is coerced or silenced — this is enriching a cross-process error with diagnostic context rather than swallowing it (Core Fix Principle seven). The copilot-chat error still throws, still propagates through onUnexpectedExternalError, and still reaches telemetry; the logService.error() call and the _asExtensionEvent try/catch attribution mechanism are untouched.

SafetyExtensionError has exactly one construction site and one instanceof consumer (extensionHostMain.ts:130) that reads .extension for attribution, never the synthetic stack, so overriding this.stack cannot regress extension attribution or the prepareStackTrace machinery. Non-Error causes are handled via optional chaining (cause?.message / cause?.stack), preserving the prior output when no cause message/stack exists.

Alternatives considered:

  • Drop the 'FAILED to handle event' label at the call site so cause.message flows through unchanged — rejected because it loses the "failed during event handling" context that distinguishes these from command/provider errors; combining both keeps the context and the real message.
  • Change packErrorForTelemetry to follow .cause — rejected as broader and riskier (it would alter bucketing for every error type globally) when the information loss originates specifically in the ExtensionError wrapper; fix the data producer, not the generic telemetry consumer.

Recommended Owner

@jrieken — author of the culprit commit d162ceb7 (introduced ExtensionError for extension events) and of the surrounding extension-host error-attribution machinery (prepareStackTrace/extensionErrors in extensionHostMain.ts, the Error#cause ES2022 enablement). Active within the last 90 days (most recent commit 2026-05-05) and owns the extension-host API surface these files belong to.

Generated by errors-fix · 2.5K AIC · ⌖ 125.3 AIC · ⊞ 69.2K ·

Copilot AI review requested due to automatic review settings June 16, 2026 17:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@vs-code-engineering vs-code-engineering Bot requested a review from Copilot June 16, 2026 17:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@vs-code-engineering vs-code-engineering Bot marked this pull request as ready for review June 16, 2026 17:33
@vs-code-engineering vs-code-engineering Bot enabled auto-merge (squash) June 16, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants