fix: surface underlying cause in ExtensionError for telemetry (fixes #321638)#321645
Open
vs-code-engineering[bot] wants to merge 1 commit into
Open
fix: surface underlying cause in ExtensionError for telemetry (fixes #321638)#321645vs-code-engineering[bot] wants to merge 1 commit into
vs-code-engineering[bot] wants to merge 1 commit into
Conversation
rzhao271
approved these changes
Jun 16, 2026
Yoyokrazy
approved these changes
Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The error-telemetry bucket
[GitHub.copilot-chat] unhandlederror-Error in extension github.copilot-chat: FAILED to handle eventis a diagnosability defect in core VS Code, not a single copilot-chat bug. Every extension event-listener failure routed through_asExtensionEventis wrapped in anExtensionErrorwhose constructor doesmessage ?? cause.message— when the constant context label'FAILED to handle event'is supplied, the real underlying error's message is discarded, and the wrapper's own generic event-dispatch stack stands in for the cause's stack. Because the telemetry pipeline (packErrorForTelemetry) and the server-side bucket fingerprint key on the top-levelmessage/stack, all distinct copilot-chat listener failures collapse into one bucket (94,695 hits / 2,058 users on stable0.52.0) carrying a constant message and aMessagePortMain.emitstack — the actual error is unrecoverable from telemetry.The fix restores the cause's message and stack onto the
ExtensionErrorso each real failure is separately bucketed and diagnosable, without swallowing or silencing anything — the error still throws and still reaches telemetry, it simply carries its real identity.Fixes #321638
Recommended reviewer:
@jriekenCulprit Commit
d162ceb7fecc3476f27d75196e3dd71b06c51bc1— "extension events use newExtensionErrorso that these errors don't make it into 'normal' error telemetry" (PR #236336, fixes #232914), by@jrieken, 2024-12-17.This commit introduced the
ExtensionErrorwrapper for the_asExtensionEventpath. Its intent — keep extension event errors out of "normal" core error telemetry and attribute them to the extension — is correct. The unintended side effect is themessage ?? cause.messagecollapse described above: routing these errors to extension-attributed telemetry while replacing the cause's message with a constant label merges every distinct failure into a single fingerprint. The bucket is therefore "pre-existing across versions" rather than caused by a recent copilot-chat change; it spikes on each stable rollout (hencestable-anomaly) because that is when the aggregate volume of all copilot-chat event errors becomes visible.Note: the shipped build is a shallow/grafted checkout, so
git blamecould not be run locally; the culprit was identified via GitHub commit history on the affected files.Code Flow
flowchart TD A["Main thread sends RPC to ext host"] --> B["MessagePortMain.emit message event"] B --> C["RPCProtocol dispatch to ExtHost API accept handler"] C --> D["VS Code Event fires"] D --> E["_asExtensionEvent wrapper listener (extHost.api.impl.ts:279-291)"] E --> F["copilot-chat listener throws real Error = cause"] F --> G["catch: new ExtensionError(ext, cause, 'FAILED to handle event') (extHost.api.impl.ts:285)"] G --> H["ExtensionError ctor uses message ?? cause.message: real message discarded, generic stack kept (extensions.ts:509)"] H --> I["onUnexpectedExternalError to setUnexpectedErrorHandler (extensionHostMain.ts:121)"] I --> J["packErrorForTelemetry reads top-level message and stack only (errorTelemetry.ts:67)"] J --> K["Server fingerprint = message + file + version: all failures collapse into ONE bucket"]Affected Files
src/vs/platform/extensions/common/extensions.ts— (changed)ExtensionErrorconstructor: the single place where the cause's message/stack were discarded.src/vs/workbench/api/common/extHost.api.impl.ts— (unchanged)_asExtensionEventwrapper, the soleExtensionErrorconstruction site (line 285); reporting/attribution works as designed and is intentionally left intact.src/vs/platform/telemetry/common/errorTelemetry.ts— (unchanged)packErrorForTelemetryreads only top-levelmessage/stack, which is why surfacing the cause on those fields fixes bucketing.src/vs/workbench/api/common/extensionHostMain.ts— (unchanged) theinstanceof ExtensionErrorconsumer (line 130) reads.extensiononly, so adopting the cause's stack is attribution-safe.Repro Steps
window.onDidChangeX,workspace.onDidChangeY, a chat/notebook event) throws synchronously.Error in extension <id>: FAILED to handle eventwith a genericMessagePortMain.emit(node:events) stack — identical for every distinct listener failure.error.cause, whichpackErrorForTelemetrydoes not read, so the bucket cannot distinguish or diagnose them.How the Fix Works
Chosen approach —
src/vs/platform/extensions/common/extensions.ts(ExtensionErrorconstructor):message ?? cause.messagewithmessage && cause?.message ? \${message}: ${cause.message}` : (message ?? cause?.message). When a context label is supplied (the only current call site always supplies'FAILED to handle event'), the realcause.messageis now **appended** instead of discarded, e.g.Error in extension github.copilot-chat: FAILED to handle event: Cannot read properties of undefined (reading uri)`. Because the server-side bucket fingerprint includes the message, this de-aggregates the catch-all bucket into one bucket per real failure and surfaces the actual error type to maintainers. This is the data-producer principle in action: fix the wrapper that produces the misleading top-level field, not the crash site and not a telemetry consumer.this.stack = cause.stack). The cause's stack contains the real copilot-chat frames (deminifiable to source), so the telemetry call site points at the true failure origin instead of the generic RPC-dispatch frames.Nothing is coerced or silenced — this is enriching a cross-process error with diagnostic context rather than swallowing it (Core Fix Principle seven). The copilot-chat error still throws, still propagates through
onUnexpectedExternalError, and still reaches telemetry; thelogService.error()call and the_asExtensionEventtry/catch attribution mechanism are untouched.Safety —
ExtensionErrorhas exactly one construction site and oneinstanceofconsumer (extensionHostMain.ts:130) that reads.extensionfor attribution, never the synthetic stack, so overridingthis.stackcannot regress extension attribution or theprepareStackTracemachinery. Non-Errorcauses are handled via optional chaining (cause?.message/cause?.stack), preserving the prior output when no cause message/stack exists.Alternatives considered:
'FAILED to handle event'label at the call site socause.messageflows through unchanged — rejected because it loses the "failed during event handling" context that distinguishes these from command/provider errors; combining both keeps the context and the real message.packErrorForTelemetryto follow.cause— rejected as broader and riskier (it would alter bucketing for every error type globally) when the information loss originates specifically in theExtensionErrorwrapper; fix the data producer, not the generic telemetry consumer.Recommended Owner
@jrieken— author of the culprit commitd162ceb7(introducedExtensionErrorfor extension events) and of the surrounding extension-host error-attribution machinery (prepareStackTrace/extensionErrorsinextensionHostMain.ts, theError#causeES2022 enablement). Active within the last 90 days (most recent commit 2026-05-05) and owns the extension-host API surface these files belong to.