From 33a6fa6b885af5755e9e43ead28445c8bbe042f0 Mon Sep 17 00:00:00 2001 From: s1gr1d <32902192+s1gr1d@users.noreply.github.com> Date: Wed, 4 Mar 2026 17:13:41 +0100 Subject: [PATCH 1/7] docs(sdks): Add docs on `dataCollection` --- .../client/data-collection/index.mdx | 335 ++++++++++++++++++ .../sdk/foundations/data-scrubbing.mdx | 56 +-- 2 files changed, 342 insertions(+), 49 deletions(-) create mode 100644 develop-docs/sdk/foundations/client/data-collection/index.mdx diff --git a/develop-docs/sdk/foundations/client/data-collection/index.mdx b/develop-docs/sdk/foundations/client/data-collection/index.mdx new file mode 100644 index 0000000000000..76026ac4cf77f --- /dev/null +++ b/develop-docs/sdk/foundations/client/data-collection/index.mdx @@ -0,0 +1,335 @@ +--- +title: Data Collection +description: Configuration for what data SDKs collect by default — technical context, PII, and sensitive data. +spec_id: sdk/foundations/client/data-collection +spec_version: 1.0.0 +spec_status: candidate +spec_depends_on: + - id: sdk/foundations/client + version: ">=1.0.0" +spec_changelog: + - version: 1.0.0 + date: 2025-03-05 + summary: Initial spec; dataCollection config, three data tiers, cookies/headers denylist, replace sendDefaultPii. +sidebar_order: 1 +--- + + + + + +## Overview + +This spec defines how SDKs control **what data is collected automatically** from the runtime (device, requests, responses, user context). It replaces the single `sendDefaultPii` (or platform-equivalent) flag with a structured `dataCollection` configuration so users can enable or restrict collection by category and by field. + +Related specs: + +- [Data Handling](/sdk/expected-features/data-handling/) — structuring data for scrubbing (spans, breadcrumbs), variable size limits +- [Client](/sdk/foundations/client/) — client lifecycle and event pipeline +- [Configuration](/sdk/foundations/client/configuration/) — top-level init options including `send_default_pii` (deprecated in favor of this spec) + +--- + +## Data Tiers + +Collected data is grouped into three tiers. SDKs **MUST** treat these tiers consistently when applying defaults and user configuration. + + + +### 1. Technical Context Data + +Non-identifying context used for debugging and performance: + +- Device and environment context (OS, runtime, non-PII identifiers) +- Performance and error context (stack traces, breadcrumbs, span metadata) +- Framework/routing context where it does not contain PII or secrets + +This tier is **not** gated by the data collection configuration. SDKs **MAY** collect it by default. + +### 2. PII Data + +Personally identifiable or user-linked data: + +- User identifiers (user ID, username, email) +- IP address +- Cookies and headers that identify the user or session +- AI Agent input and output messages + +This tier **MUST** be off by default unless the user opts in via `includeUserInfo` and/or explicit `collect` allowlists. See [Include User Info](#include-user-info), [collect options](#collect-options), and [Default Denylist](#default-denylist). + +### 3. Sensitive Data + +Credentials and secrets that must never be sent by default: + +- Passwords, tokens, API keys, bearer tokens +- Header or cookie values that match known sensitive names (auth, token, secret, password, key, jwt, etc.) + +SDKs **MUST** never send sensitive **values** through automatic instrumentation; keys are included by the SDK while values are replaced with `"[Filtered]"` (see [Default Denylist](#default-denylist)). Users can use `beforeSend` (or equivalent) to remove or redact keys if needed. + + + +--- + +## Configuration Surface + +All data-collection options live under a single key: `dataCollection`. + + + +### Top-Level Shape + +At the top level, users **MAY** specify **`includeUserInfo`** and a **`collect`** record. + +- **`includeUserInfo`** is the primary toggle for Personally Identifiable Information (PII). It controls whether user-identity fields are included in or excluded from automatic collection, and sets the default for other PII-heavy options (such as `aiAgentMessages`). +- **`collect`** controls which categories of request/response and runtime data are gathered (cookies, headers, body, query params, etc.); see the sections below. + +Users configure data collection via the init options. An example with the default options: + +```typescript +init({ + dsn: "...", + dataCollection: { + includeUserInfo: false, + collect: { + cookies: true, + httpHeaders: true, + queryParams: true, + stackTraceVariables: true, + incomingRequestBody: false, + outgoingRequestBody: false, + aiAgentMessages: false + }, + }, +}); +``` + +- SDKs **MUST** support at least `includeUserInfo` and the `collect` object. SDKs **MAY** omit options that do not apply to the platform (e.g. no `outgoingRequestBody` on a backend-only SDK). + +For how `includeUserInfo` affects the defaults of collection options, see [How Defaults Cascade](#how-defaults-cascade). + + + +--- + +## Option Reference + + + + ### `includeUserInfo` + + **Type:** Boolean option + + Controls whether the SDK automatically attaches user identity fields to events (e.g. `user.id`, `user.email`, `user.username`, `user.ip_address`). This is the primary PII gate: its value sets the default for all other PII-heavy options, most notably `aiAgentMessages`. + + | Value | Behavior | + |-------|----------| + | `true` | Attach all user identity fields captured by automatic instrumentation. Equivalent to the legacy `sendDefaultPii` flag scoped to user data. | + | `false` | Do not attach user identity fields from automatic instrumentation. | + + - **Default:** `false`. + - When user data is set **explicitly** on the scope (or equivalent), it is **always** attached regardless of this setting. See [User-Set Data and Scrubbing](#user-set-data-scrubbing). + + + +--- + + + + ### `collect` Options + + Each key under `collect` maps to a category of automatically collected data. Refer to the [Option Types](#option-types) section below to understand which value type each key accepts. + + | Key | Option Type | Default | Description | + |-----|-------------|---------|-------------| + | `cookies` | Collection | `true` | Include cookie values; keys are filtered by the default denylist or by allow/deny lists. | + | `httpHeaders` | Collection | `true` | Include HTTP header values; keys are filtered by the default denylist or by allow/deny lists. | + | `queryParams` | Collection | `true` | Include URL query parameter values; keys are filtered by the default denylist or by allow/deny lists. | + | `stackTraceVariables` | Boolean | Inherits `includeUserInfo` | Include local variable values captured within stack traces. | + | `incomingRequestBody` | Boolean | Inherits `includeUserInfo` | Include full body of the incoming HTTP request. | + | `outgoingRequestBody` | Boolean | Inherits `includeUserInfo` | Include full body of outgoing HTTP requests. | + | `aiAgentMessages` | Boolean | Inherits `includeUserInfo` | Include AI agent input and output messages. | + + + Unlike cookies or headers, some data (e.g. request bodies) has no predictable key structure for the SDK to filter. Data can still be redacted in `beforeSend` or event processors if needed. + + + + + + ### Option Types + + Each option in `dataCollection.collect` uses one of two distinct value types. Which type an option accepts depends on whether the collected data is structured as named key-value pairs (e.g. cookies, headers) or as an opaque blob (e.g. request bodies). + + --- + + #### Boolean Options + + Used for categories where data cannot be meaningfully filtered at the key level — the SDK either collects the entire category or skips it entirely. + + | Value | Behavior | + |-------|----------| + | `true` | Collect and attach this data category. | + | `false` | Do not collect this data category at all. | + + Examples: `incomingRequestBody`, `outgoingRequestBody`, `aiAgentMessages`. + + --- + + #### Collection Options + + Used for categories structured as named key-value pairs, where the SDK can inspect individual keys and apply filtering rules before attaching data. In addition to the `true`/`false` toggle, these options accept an allow/deny object for fine-grained control. + + | Value | Behavior | + |-------|----------| + | `true` | Collect this category. Apply the default denylist — values for sensitive key names (e.g. `authorization`, `cookie`, `token`) are replaced with `"[Filtered]"` (see [Default Denylist](#default-denylist)). | + | `false` | Do not collect this category at all. | + | `{ deny: string[] }` | Collect this category. Apply the default denylist **plus** these additional key names. | + | `{ allow: string[] }` | Collect **only** keys matching this list. The default denylist is replaced, but sensitive values **MUST** still be scrubbed regardless. | + + > **Note:** Sensitive key **values** are always scrubbed — they are replaced with `"[Filtered]"` — regardless of how the collection option is configured. The allow/deny lists control which keys are included, not whether scrubbing applies. + + Examples: `cookies`, `httpHeaders`, `queryParams`. + + + + + + + ### How Defaults Cascade + + Because `includeUserInfo` acts as the main gate for PII, its value determines the defaults for other PII-heavy options. Explicitly set `collect` options always override these defaults. + + | Option type | Default when `includeUserInfo: true` | Default when `includeUserInfo: false` | + |-------------|--------------------------------------|----------------------------------------| + | Collection (key-value pairs) | `true` — use default denylist | `true` — use default denylist, plus PII keys denied | + | Boolean | `true` — attach | `false` — do not attach | + + + + + + + +--- + +## Default Denylist + +For key-value data (HTTP headers, cookies, URL query params), SDKs **MUST** apply a **default denylist** by key name: values for known-sensitive keys are replaced with `"[Filtered]"`; **keys are never scrubbed** by the SDK. + + + +### Matching Rule + +SDKs **MUST** perform a **partial, case-insensitive match** when comparing header names, cookie names, and query parameter names against the denylist. A key is treated as sensitive if any denylist term appears as a substring in the key (e.g. the term `auth` matches the header names `Authorization` and `X-Auth-Token`). + +### Base Denylist (Sensitive Data) + +The following terms **MUST** be included in the default denylist for headers (and **SHOULD** be used for cookies and query params where applicable). A key is sensitive if it **partially matches**, case-insensitively, any of: + +`["auth", "token", "secret", "password", "passwd", "pwd", "key", "jwt", "bearer", "sso", "saml", "csrf", "xsrf", "credentials", "session", "sid", "identity"]` + +Values for keys that match **MUST** be replaced with `"[Filtered]"`. + +### PII Denylist (when `includeUserInfo` is `false`) + +When `includeUserInfo` is `false`, SDKs **MUST** apply the base denylist **and** treat the following as sensitive (in addition to the list above): + +- Any data that contains the following: email, user ID, IP address, username, machine name (if applicable) +- Any header or key containing **`x-forwarded-`** (e.g. `x-forwarded-for`, `x-forwarded-host`) — often carries client IP or host. +- Any header or key whose name ends with or contains **`-user`** (e.g. `x-user-id`, `remote-user`) — often carries user identifiers. + +So the effective denylist when PII is disabled is: base list + `["x-forwarded-", "-user"]` (substring match, case-insensitive). + +### Cookies and Cookie Headers +- SDKs **SHOULD** maintain a default denylist of cookie names using the same matching rule (e.g. `session`, `auth`, `identity`). Values for matching cookie names **MUST** be replaced with `"[Filtered]"`. +- **When individual cookie key-value pairs cannot be extracted** (e.g. malformed or opaque cookie header), the entire `Cookie` or `Set-Cookie` header value **MUST** be replaced with `"[Filtered]"`. Unfiltered raw cookie header values **MUST NOT** be sent. When in doubt, treat the whole cookie header as sensitive. + +### Request Bodies + +When request or response bodies are collected (`incomingRequestBody` / `outgoingRequestBody`): + +- **Parseable as JSON or form data:** SDKs **MAY** extract key-value pairs and apply the same denylist rules (partial, case-insensitive match) to keys. Values for matching keys **MUST** be replaced with `"[Filtered]"`. This allows selective scrubbing while retaining non-sensitive fields for debugging. +- **Raw bodies (not parseable as JSON or FormData):** The body **MUST** be removed and **MUST NOT** be attached to the event. "Raw" HTTP bodies (e.g. binary payloads, plain text, or unparseable content) are never sent through automatic instrumentation. When the SDK cannot parse the body into key-value structure, the entire body **MUST** be replaced with `"[Filtered]"`. + +No built-in option scrubs **keys**; users who need to hide header or cookie names **MUST** use `beforeSend` (or equivalent). + + + +--- + +## Use Cases + +The following examples show how `dataCollection` maps to common configurations. + + + +### Maximum PII (full collection) + +When the user enables full PII collection: + +- `includeUserInfo: true` +- `collect`: all collection options `true` (default denylist); `incomingRequestBody`, `outgoingRequestBody`, and `aiAgentMessages` are `true`. + +**Result:** Technical context and request/response data (headers, cookies, query params) are collected with the default denylist; request bodies, user identifiers, and AI agent messages are included; sensitive values are still replaced with `"[Filtered]"`. + + + + + +### Granular Debugging + +The user wants to include user info and only specific headers for debugging, and does not want to send query params at all: + +```typescript +init({ + dsn: "...", + dataCollection: { + includeUserInfo: true, + collect: { + httpHeaders: { allow: ['x-request-id', 'x-trace-id', 'x-correlation-id'] }, + queryParams: false, + }, + }, +}); +``` + +Because `includeUserInfo` is set, `aiAgentMessages` defaults to `true` unless the user explicitly sets `collect: { aiAgentMessages: false }`. + + + + + +### Migration from sendDefaultPii + +- **`sendDefaultPii: true`** (legacy) → `dataCollection: { includeUserInfo: true }` and keep `collect` defaults. +- **`sendDefaultPii: false`** (legacy) → `dataCollection: { includeUserInfo: false }` (or omit; same as default). + +SDKs **SHOULD** document this mapping and **MAY** implement `send_default_pii` as a compatibility shim that sets `includeUserInfo`. + + + +--- + +## User-Set Data and Scrubbing + + + +### Data set by the user + +When the user **explicitly** sets data on the scope (user, request, response, tags, contexts, etc.) or on a span, log, or other telemetry, that data is **not** gated by `dataCollection`. It **MUST** always be attached to outgoing telemetry. The same applies to data the user provides via `beforeSend` or event processors (e.g. attaching a request object). + +### Automatic vs explicit data + +SDKs **SHOULD** only replace sensitive values with `"[Filtered]"` when the data is gathered **automatically** through instrumentation. If the user explicitly provides data (e.g. by setting a request object on the scope), the SDK **MUST NOT** modify it; the user is responsible for what they attach. + +### beforeSend and event processors + +Users can register callbacks (e.g. `beforeSend`, event processors) to remove or redact any data — including keys — before events are sent. This spec does not replace those hooks; they remain the mechanism for custom filtering and key removal. + + + +--- + +## Changelog + + diff --git a/develop-docs/sdk/foundations/data-scrubbing.mdx b/develop-docs/sdk/foundations/data-scrubbing.mdx index 103fa9261c55c..3a88210f945f8 100644 --- a/develop-docs/sdk/foundations/data-scrubbing.mdx +++ b/develop-docs/sdk/foundations/data-scrubbing.mdx @@ -3,63 +3,21 @@ title: Data Scrubbing sidebar_order: 6 --- -Data handling is the standardized context in how we want SDKs help users filter data. - -## Sensitive Data - -SDKs should not include PII or other sensitive data in the payload by default. -When building an SDK we can come across some API that can give useful information to debug a problem. -In the event that API returns data considered PII, we guard that behind a flag called _Send Default PII_. -This is an option in the SDK called [_send-default-pii_](https://docs.sentry.io/platforms/python/configuration/options/#send-default-pii) -and is **disabled by default**. That means that data that is naturally sensitive is not sent by default. +Data handling is the standardized context in how we want SDKs to help users filter data. -When a user manually sets the data on the scope (user, contexts, tags, data, request, response, etc.), this data should not be gated by the _Send Default PII_ flag and should always be attached to all outgoing telemetry. This also applies to the data that the user manually sets on a span, log, metric and other types of telemetry (directly or, for example, via `BeforeSend`). +**Data collection and scrubbing:** The canonical spec for what data SDKs collect, default denylists (headers, cookies, query params), request body and cookie scrubbing, user-set data, and `beforeSend` is [Data Collection](/sdk/foundations/client/data-collection/). That spec supersedes the sensitive-data and cookie sections below for SDK behavior. This page retains **Structuring Data** and **Variable Size** and the legacy `send_default_pii` context for reference. -Certain sensitive data must never be sent through SDK instrumentation, regardless of any configuration: - -- HTTP Headers: The keys of known sensitive headers are added, while their values must be replaced with `"[Filtered]"`. - - The SDK performs a **partial, case-insensitive match** against the following headers to determine if they are sensitive: `["auth", "token", "secret", "password", "passwd", "pwd", "key", "jwt", "bearer", "sso", "saml", "csrf", "xsrf", "credentials"]` - -SDKs should only replace sensitive data with `"[Filtered]"` when the data is gathered automatically through instrumentation. -If a user explicitly provides data (for example, by setting a request object on the scope), the SDK must not modify it. - -Some examples of data guarded by `send_default_pii: false`: - -- When attaching data of HTTP requests and/or responses to events - - Request Body: "raw" HTTP bodies (bodies which cannot be parsed as JSON or FormData) are removed - - HTTP Headers: header values, containing information about the user are replaced with `"[Filtered]"` -- User-specific information (e.g. the current user ID according to the used web-framework) is not collected and therefore not sent at all. -- On desktop applications - - The username logged in the device is not included. This is often a person's name. - - The machine name is not included, for example `Bruno's laptop` -- SDKs don't set `{{auto}}` as `user.ip_address`. This instructs the server to keep the connection's IP address. -- Server SDKs remove the IP address of incoming HTTP requests. - -Sentry server is always aware of the connecting IP address and can use it for logging in some platforms. Namely JavaScript and iOS/macOS/tvOS. -All other platforms require the event to include `user.ip_address={{auto}}` which happens if `sendDefaultPii` is set to true. - -Before sending events to Sentry, the SDKs should invokes callbacks. That allows users to remove any sensitive data client-side. - -- [`before-send` and `event-processors`](/sdk/foundations/client/#event-pipeline) can be used to register a callback with custom logic to remove sensitive data. - -### Cookies - -Since `Cookie` and `Set-Cookie` headers can contain a mix of sensitive and non-sensitive data, SDKs should parse the cookie header and filter values on a per-key basis, depending on the SDK setting and the sensitivity of the cookie value. -In case, the SDK cannot parse each cookie key-value pair, the entire cookie header must be replaced with `"[Filtered]"`. An unfiltered, raw cookie header value must never be sent. - -This selective filtering prevents capturing sensitive data while retaining harmless contextual information for debugging. -For example, a sensitive session cookie's value is replaced with "[Filtered]", but a non-sensitive cookie for the theme preference can be sent as-is. +## Sensitive Data -When attached as span attributes, the results should be as follows: +The normative rules for sensitive data, PII, cookies, request bodies, and user-set data are in [Data Collection](/sdk/foundations/client/data-collection/). The following is kept for context: -- `http.request.header.cookie.user_session: "[Filtered]"` -- `http.request.header.cookie.theme: "dark-mode"` -- `http.request.header.set_cookie.theme: "light-mode"` -- `http.request.header.cookie: "[Filtered]"` (Used as a fallback if the cookie header cannot be parsed) +- SDKs should not include PII or other sensitive data in the payload by default. The legacy option [_send-default-pii_](https://docs.sentry.io/platforms/python/configuration/options/#send-default-pii) is **disabled by default**; the replacement is `dataCollection.includeUserInfo` and `dataCollection.collect` (see [Data Collection](/sdk/foundations/client/data-collection/)). +- Certain sensitive data must never be sent through SDK instrumentation: header/cookie/query values matching the default denylist are replaced with `"[Filtered]"`. User-set data is always attached; only automatically gathered data is scrubbed. Users can use `beforeSend` / event processors to remove or redact any data. +- For the exact default denylist (partial, case-insensitive match), PII denylist (`x-forwarded-`, `-user`), cookies when unparseable, and raw request bodies, see [Data Collection — Default Denylist](/sdk/foundations/client/data-collection/#default-denylist) and [User-Set Data and Scrubbing](/sdk/foundations/client/data-collection/#user-set-data-scrubbing). ### Application State From 07f843eb7dd6926d39aa15c7263151a775150266 Mon Sep 17 00:00:00 2001 From: s1gr1d <32902192+s1gr1d@users.noreply.github.com> Date: Thu, 5 Mar 2026 14:35:47 +0100 Subject: [PATCH 2/7] fix typo --- develop-docs/sdk/foundations/client/data-collection/index.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/develop-docs/sdk/foundations/client/data-collection/index.mdx b/develop-docs/sdk/foundations/client/data-collection/index.mdx index 76026ac4cf77f..6239f96506361 100644 --- a/develop-docs/sdk/foundations/client/data-collection/index.mdx +++ b/develop-docs/sdk/foundations/client/data-collection/index.mdx @@ -94,7 +94,7 @@ init({ cookies: true, httpHeaders: true, queryParams: true, - stackTraceVariables: true, + stackTraceVariables: false, incomingRequestBody: false, outgoingRequestBody: false, aiAgentMessages: false @@ -249,7 +249,7 @@ So the effective denylist when PII is disabled is: base list + `["x-forwarded-", When request or response bodies are collected (`incomingRequestBody` / `outgoingRequestBody`): - **Parseable as JSON or form data:** SDKs **MAY** extract key-value pairs and apply the same denylist rules (partial, case-insensitive match) to keys. Values for matching keys **MUST** be replaced with `"[Filtered]"`. This allows selective scrubbing while retaining non-sensitive fields for debugging. -- **Raw bodies (not parseable as JSON or FormData):** The body **MUST** be removed and **MUST NOT** be attached to the event. "Raw" HTTP bodies (e.g. binary payloads, plain text, or unparseable content) are never sent through automatic instrumentation. When the SDK cannot parse the body into key-value structure, the entire body **MUST** be replaced with `"[Filtered]"`. +- **Raw bodies (not parseable as JSON or FormData):** The body **MUST** be removed and **MUST NOT** be attached to the event. "Raw" HTTP bodies (e.g. binary payloads, plain text, or unparsable content) are never sent through automatic instrumentation. When the SDK cannot parse the body into key-value structure, the entire body **MUST** be replaced with `"[Filtered]"`. No built-in option scrubs **keys**; users who need to hide header or cookie names **MUST** use `beforeSend` (or equivalent). From c36a5d8cbd1b5d8def72da661cea58f3c3a733c9 Mon Sep 17 00:00:00 2001 From: s1gr1d <32902192+s1gr1d@users.noreply.github.com> Date: Fri, 6 Mar 2026 14:57:01 +0100 Subject: [PATCH 3/7] review suggestions --- .../client/data-collection/index.mdx | 42 ++++++++++--------- 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/develop-docs/sdk/foundations/client/data-collection/index.mdx b/develop-docs/sdk/foundations/client/data-collection/index.mdx index 6239f96506361..3febaba7ca0b7 100644 --- a/develop-docs/sdk/foundations/client/data-collection/index.mdx +++ b/develop-docs/sdk/foundations/client/data-collection/index.mdx @@ -3,7 +3,7 @@ title: Data Collection description: Configuration for what data SDKs collect by default — technical context, PII, and sensitive data. spec_id: sdk/foundations/client/data-collection spec_version: 1.0.0 -spec_status: candidate +spec_status: draft spec_depends_on: - id: sdk/foundations/client version: ">=1.0.0" @@ -34,14 +34,14 @@ Related specs: Collected data is grouped into three tiers. SDKs **MUST** treat these tiers consistently when applying defaults and user configuration. - + ### 1. Technical Context Data Non-identifying context used for debugging and performance: - Device and environment context (OS, runtime, non-PII identifiers) -- Performance and error context (stack traces, breadcrumbs, span metadata) +- Performance and error context (stack frames, breadcrumbs, span metadata) - Framework/routing context where it does not contain PII or secrets This tier is **not** gated by the data collection configuration. SDKs **MAY** collect it by default. @@ -74,7 +74,7 @@ SDKs **MUST** never send sensitive **values** through automatic instrumentation; All data-collection options live under a single key: `dataCollection`. - + ### Top-Level Shape @@ -94,10 +94,11 @@ init({ cookies: true, httpHeaders: true, queryParams: true, - stackTraceVariables: false, incomingRequestBody: false, outgoingRequestBody: false, - aiAgentMessages: false + aiAgentMessages: true + stackFrameVariables: true, + frameContextLines: 5, }, }, }); @@ -113,7 +114,7 @@ For how `includeUserInfo` affects the defaults of collection options, see [How D ## Option Reference - + ### `includeUserInfo` @@ -133,7 +134,7 @@ For how `includeUserInfo` affects the defaults of collection options, see [How D --- - + ### `collect` Options @@ -144,17 +145,18 @@ For how `includeUserInfo` affects the defaults of collection options, see [How D | `cookies` | Collection | `true` | Include cookie values; keys are filtered by the default denylist or by allow/deny lists. | | `httpHeaders` | Collection | `true` | Include HTTP header values; keys are filtered by the default denylist or by allow/deny lists. | | `queryParams` | Collection | `true` | Include URL query parameter values; keys are filtered by the default denylist or by allow/deny lists. | - | `stackTraceVariables` | Boolean | Inherits `includeUserInfo` | Include local variable values captured within stack traces. | - | `incomingRequestBody` | Boolean | Inherits `includeUserInfo` | Include full body of the incoming HTTP request. | - | `outgoingRequestBody` | Boolean | Inherits `includeUserInfo` | Include full body of outgoing HTTP requests. | - | `aiAgentMessages` | Boolean | Inherits `includeUserInfo` | Include AI agent input and output messages. | + | `incomingRequestBody` | Boolean | Default TBD | Include full body of the incoming HTTP request. | + | `outgoingRequestBody` | Boolean | Default TBD | Include full body of outgoing HTTP requests. | + | `aiAgentMessages` | Boolean | `true` | Include AI agent input and output messages. | + | `stackFrameVariables` | Boolean | `true` | Include local variable values captured within stack frames. | + | `frameContextLines` | Number
(Boolean, if not otherwise possible) | `5`
(`true`) | (Number of) lines of context to include around stack frames. | Unlike cookies or headers, some data (e.g. request bodies) has no predictable key structure for the SDK to filter. Data can still be redacted in `beforeSend` or event processors if needed. - + ### Option Types @@ -193,11 +195,11 @@ For how `includeUserInfo` affects the defaults of collection options, see [How D - + ### How Defaults Cascade - Because `includeUserInfo` acts as the main gate for PII, its value determines the defaults for other PII-heavy options. Explicitly set `collect` options always override these defaults. + Because `includeUserInfo` acts as the main gate for PII, its value determines the default denylist for the `collect` options. Explicitly set `collect` options always override this default. | Option type | Default when `includeUserInfo: true` | Default when `includeUserInfo: false` | |-------------|--------------------------------------|----------------------------------------| @@ -216,7 +218,7 @@ For how `includeUserInfo` affects the defaults of collection options, see [How D For key-value data (HTTP headers, cookies, URL query params), SDKs **MUST** apply a **default denylist** by key name: values for known-sensitive keys are replaced with `"[Filtered]"`; **keys are never scrubbed** by the SDK. - + ### Matching Rule @@ -261,7 +263,7 @@ No built-in option scrubs **keys**; users who need to hide header or cookie name The following examples show how `dataCollection` maps to common configurations. - + ### Maximum PII (full collection) @@ -274,7 +276,7 @@ When the user enables full PII collection: - + ### Granular Debugging @@ -297,7 +299,7 @@ Because `includeUserInfo` is set, `aiAgentMessages` defaults to `true` unless th - + ### Migration from sendDefaultPii @@ -312,7 +314,7 @@ SDKs **SHOULD** document this mapping and **MAY** implement `send_default_pii` a ## User-Set Data and Scrubbing - + ### Data set by the user From 78d08958c85a0f285481a9c06e54221bcccbf989 Mon Sep 17 00:00:00 2001 From: s1gr1d <32902192+s1gr1d@users.noreply.github.com> Date: Fri, 6 Mar 2026 15:24:47 +0100 Subject: [PATCH 4/7] create technical spec (skill) --- .../client/data-collection/index.mdx | 332 +++++++++--------- 1 file changed, 160 insertions(+), 172 deletions(-) diff --git a/develop-docs/sdk/foundations/client/data-collection/index.mdx b/develop-docs/sdk/foundations/client/data-collection/index.mdx index 3febaba7ca0b7..8cc3c55938e61 100644 --- a/develop-docs/sdk/foundations/client/data-collection/index.mdx +++ b/develop-docs/sdk/foundations/client/data-collection/index.mdx @@ -30,13 +30,15 @@ Related specs: --- -## Data Tiers - -Collected data is grouped into three tiers. SDKs **MUST** treat these tiers consistently when applying defaults and user configuration. +## Concepts -### 1. Technical Context Data +### Data Tiers + +Collected data is grouped into three tiers. SDKs **MUST** treat these tiers consistently when applying defaults and user configuration. + +#### 1. Technical Context Data Non-identifying context used for debugging and performance: @@ -46,241 +48,255 @@ Non-identifying context used for debugging and performance: This tier is **not** gated by the data collection configuration. SDKs **MAY** collect it by default. -### 2. PII Data +#### 2. PII Data Personally identifiable or user-linked data: - User identifiers (user ID, username, email) - IP address - Cookies and headers that identify the user or session -- AI Agent input and output messages +- AI agent input and output messages -This tier **MUST** be off by default unless the user opts in via `includeUserInfo` and/or explicit `collect` allowlists. See [Include User Info](#include-user-info), [collect options](#collect-options), and [Default Denylist](#default-denylist). +This tier **MUST** be off by default unless the user opts in via `includeUserInfo` and/or explicit `collect` allowlists. See [`includeUserInfo`](#include-user-info-behavior), [`collect` options](#collect-option-behavior), and [Default Denylist](#default-denylist). -### 3. Sensitive Data +#### 3. Sensitive Data -Credentials and secrets that must never be sent by default: +Credentials and secrets that **MUST** never be sent by default: - Passwords, tokens, API keys, bearer tokens - Header or cookie values that match known sensitive names (auth, token, secret, password, key, jwt, etc.) -SDKs **MUST** never send sensitive **values** through automatic instrumentation; keys are included by the SDK while values are replaced with `"[Filtered]"` (see [Default Denylist](#default-denylist)). Users can use `beforeSend` (or equivalent) to remove or redact keys if needed. +SDKs **MUST** never send sensitive **values** through automatic instrumentation — values are replaced with `"[Filtered]"` while keys are retained (see [Default Denylist](#default-denylist)). Users can use `beforeSend` (or equivalent) to remove or redact keys if needed. --- -## Configuration Surface - -All data-collection options live under a single key: `dataCollection`. +## Behavior -### Top-Level Shape +### Configuration Requirements -At the top level, users **MAY** specify **`includeUserInfo`** and a **`collect`** record. +All data-collection options live under a single top-level key: `dataCollection`. SDKs **MUST** support at least `includeUserInfo` and the `collect` object. SDKs **MAY** omit options that do not apply to the platform (e.g. no `outgoingRequestBody` on a browser-only SDK). -- **`includeUserInfo`** is the primary toggle for Personally Identifiable Information (PII). It controls whether user-identity fields are included in or excluded from automatic collection, and sets the default for other PII-heavy options (such as `aiAgentMessages`). -- **`collect`** controls which categories of request/response and runtime data are gathered (cookies, headers, body, query params, etc.); see the sections below. +`dataCollection` accepts two fields: -Users configure data collection via the init options. An example with the default options: - -```typescript -init({ - dsn: "...", - dataCollection: { - includeUserInfo: false, - collect: { - cookies: true, - httpHeaders: true, - queryParams: true, - incomingRequestBody: false, - outgoingRequestBody: false, - aiAgentMessages: true - stackFrameVariables: true, - frameContextLines: 5, - }, - }, -}); -``` - -- SDKs **MUST** support at least `includeUserInfo` and the `collect` object. SDKs **MAY** omit options that do not apply to the platform (e.g. no `outgoingRequestBody` on a backend-only SDK). - -For how `includeUserInfo` affects the defaults of collection options, see [How Defaults Cascade](#how-defaults-cascade). +- **`includeUserInfo`** — the primary toggle for Personally Identifiable Information (PII). Controls whether user-identity fields are included in automatic collection, and sets the default for PII-heavy `collect` options (such as HTTP request bodies - TBD). Defaults to `false`. +- **`collect`** — controls which categories of request/response and runtime data are gathered. See [`collect` Option Behavior](#collect-option-behavior) and [How Defaults Cascade](#how-defaults-cascade). ---- - -## Option Reference - - ### `includeUserInfo` +### `includeUserInfo` Behavior - **Type:** Boolean option +`includeUserInfo` controls whether the SDK automatically attaches user identity fields to events (e.g. `user.id`, `user.email`, `user.username`, `user.ip_address`). This is the primary PII gate: its value also sets the effective default for PII-heavy `collect` options. - Controls whether the SDK automatically attaches user identity fields to events (e.g. `user.id`, `user.email`, `user.username`, `user.ip_address`). This is the primary PII gate: its value sets the default for all other PII-heavy options, most notably `aiAgentMessages`. +| Value | Behavior | +|-------|----------| +| `true` | Attach all user identity fields captured by automatic instrumentation. Equivalent to the legacy `sendDefaultPii` flag scoped to user data. | +| `false` | Do not attach user identity fields from automatic instrumentation. | - | Value | Behavior | - |-------|----------| - | `true` | Attach all user identity fields captured by automatic instrumentation. Equivalent to the legacy `sendDefaultPii` flag scoped to user data. | - | `false` | Do not attach user identity fields from automatic instrumentation. | - - - **Default:** `false`. - - When user data is set **explicitly** on the scope (or equivalent), it is **always** attached regardless of this setting. See [User-Set Data and Scrubbing](#user-set-data-scrubbing). +When user data is set **explicitly** on the scope (or equivalent), it is **always** attached regardless of this setting. See [User-Set Data and Scrubbing](#user-set-data-and-scrubbing). ---- - - ### `collect` Options +### `collect` Option Behavior - Each key under `collect` maps to a category of automatically collected data. Refer to the [Option Types](#option-types) section below to understand which value type each key accepts. +Each key under `collect` maps to a category of automatically collected data and uses one of two option types, depending on whether the data is structured as key-value pairs. - | Key | Option Type | Default | Description | - |-----|-------------|---------|-------------| - | `cookies` | Collection | `true` | Include cookie values; keys are filtered by the default denylist or by allow/deny lists. | - | `httpHeaders` | Collection | `true` | Include HTTP header values; keys are filtered by the default denylist or by allow/deny lists. | - | `queryParams` | Collection | `true` | Include URL query parameter values; keys are filtered by the default denylist or by allow/deny lists. | - | `incomingRequestBody` | Boolean | Default TBD | Include full body of the incoming HTTP request. | - | `outgoingRequestBody` | Boolean | Default TBD | Include full body of outgoing HTTP requests. | - | `aiAgentMessages` | Boolean | `true` | Include AI agent input and output messages. | - | `stackFrameVariables` | Boolean | `true` | Include local variable values captured within stack frames. | - | `frameContextLines` | Number
(Boolean, if not otherwise possible) | `5`
(`true`) | (Number of) lines of context to include around stack frames. | +**Boolean options** — used where data cannot be meaningfully filtered at the key level. The SDK either collects the entire category or skips it. - - Unlike cookies or headers, some data (e.g. request bodies) has no predictable key structure for the SDK to filter. Data can still be redacted in `beforeSend` or event processors if needed. - +| Value | Behavior | +|-------|----------| +| `true` | Collect and attach this data category. | +| `false` | Do not collect this data category at all. | +**Collection options** — used for key-value data (cookies, headers, query params), where the SDK can inspect individual keys and apply filtering rules before attaching. - +| Value | Behavior | +|-------|----------| +| `true` | Collect this category. Apply the default denylist — values for sensitive key names are replaced with `"[Filtered]"` (see [Default Denylist](#default-denylist)). | +| `false` | Do not collect this category at all. | +| `{ deny: string[] }` | Collect this category. Apply the default denylist **plus** these additional key names. | +| `{ allow: string[] }` | Collect **only** keys in this list. The default denylist is bypassed, but sensitive values **MUST** still be scrubbed regardless. | - ### Option Types +> **Note:** Sensitive key **values** are always scrubbed — replaced with `"[Filtered]"` — regardless of collection option configuration. The allow/deny lists control which keys are included, not whether scrubbing applies. - Each option in `dataCollection.collect` uses one of two distinct value types. Which type an option accepts depends on whether the collected data is structured as named key-value pairs (e.g. cookies, headers) or as an opaque blob (e.g. request bodies). + - --- + - #### Boolean Options +### How Defaults Cascade - Used for categories where data cannot be meaningfully filtered at the key level — the SDK either collects the entire category or skips it entirely. +`includeUserInfo` determines the effective default for PII-related `collect` options. Explicitly set `collect` options always override this default. - | Value | Behavior | - |-------|----------| - | `true` | Collect and attach this data category. | - | `false` | Do not collect this data category at all. | +| Option type | Default when `includeUserInfo: true` | Default when `includeUserInfo: false` | +|-------------|--------------------------------------|----------------------------------------| +| Collection (key-value pairs) | `true` — use default denylist | `true` — use default denylist, plus PII keys denied | - Examples: `incomingRequestBody`, `outgoingRequestBody`, `aiAgentMessages`. +Non-PII boolean options (e.g. `stackFrameVariables`) are not affected by `includeUserInfo` and always default to their configured value. - --- + - #### Collection Options + - Used for categories structured as named key-value pairs, where the SDK can inspect individual keys and apply filtering rules before attaching data. In addition to the `true`/`false` toggle, these options accept an allow/deny object for fine-grained control. +### Default Denylist - | Value | Behavior | - |-------|----------| - | `true` | Collect this category. Apply the default denylist — values for sensitive key names (e.g. `authorization`, `cookie`, `token`) are replaced with `"[Filtered]"` (see [Default Denylist](#default-denylist)). | - | `false` | Do not collect this category at all. | - | `{ deny: string[] }` | Collect this category. Apply the default denylist **plus** these additional key names. | - | `{ allow: string[] }` | Collect **only** keys matching this list. The default denylist is replaced, but sensitive values **MUST** still be scrubbed regardless. | +For key-value data (HTTP headers, cookies, URL query params), SDKs **MUST** apply a **default denylist** by key name: values for known-sensitive keys are replaced with `"[Filtered]"`; **keys are never scrubbed** by the SDK. - > **Note:** Sensitive key **values** are always scrubbed — they are replaced with `"[Filtered]"` — regardless of how the collection option is configured. The allow/deny lists control which keys are included, not whether scrubbing applies. +#### Matching Rule - Examples: `cookies`, `httpHeaders`, `queryParams`. +SDKs **MUST** perform a **partial, case-insensitive match** when comparing key names against the denylist. A key is treated as sensitive if any denylist term appears as a substring in the key name (e.g. the term `auth` matches `Authorization` and `X-Auth-Token`). - +#### Base Denylist (Sensitive Data) +The following terms **MUST** be included in the default denylist for headers, and **SHOULD** be applied to cookies and query params where applicable: - +`["auth", "token", "secret", "password", "passwd", "pwd", "key", "jwt", "bearer", "sso", "saml", "csrf", "xsrf", "credentials", "session", "sid", "identity"]` - ### How Defaults Cascade +Values for keys that match **MUST** be replaced with `"[Filtered]"`. - Because `includeUserInfo` acts as the main gate for PII, its value determines the default denylist for the `collect` options. Explicitly set `collect` options always override this default. +#### PII Denylist (when `includeUserInfo` is `false`) - | Option type | Default when `includeUserInfo: true` | Default when `includeUserInfo: false` | - |-------------|--------------------------------------|----------------------------------------| - | Collection (key-value pairs) | `true` — use default denylist | `true` — use default denylist, plus PII keys denied | - | Boolean | `true` — attach | `false` — do not attach | +When `includeUserInfo` is `false`, SDKs **MUST** apply the base denylist **and** additionally treat the following as sensitive: - +- Any data that contains email, user ID, IP address, username, or machine name (if applicable) +- Any key containing **`x-forwarded-`** (e.g. `x-forwarded-for`, `x-forwarded-host`) — often carries client IP or host +- Any key ending with or containing **`-user`** (e.g. `x-user-id`, `remote-user`) — often carries user identifiers +Effective denylist when PII is disabled: base list + `["x-forwarded-", "-user"]` (partial match, case-insensitive). +#### Cookies and Cookie Headers -
+- SDKs **SHOULD** maintain a default denylist of cookie names using the same matching rule (e.g. `session`, `auth`, `identity`). Values for matching cookie names **MUST** be replaced with `"[Filtered]"`. +- **When individual cookie key-value pairs cannot be extracted** (e.g. malformed or opaque cookie string), the entire `Cookie` or `Set-Cookie` header value **MUST** be replaced with `"[Filtered]"`. Unfiltered raw cookie header values **MUST NOT** be sent. When in doubt, treat the whole cookie header as sensitive. ---- +#### Request Bodies -## Default Denylist +When request or response bodies are collected (`incomingRequestBody` / `outgoingRequestBody`): -For key-value data (HTTP headers, cookies, URL query params), SDKs **MUST** apply a **default denylist** by key name: values for known-sensitive keys are replaced with `"[Filtered]"`; **keys are never scrubbed** by the SDK. +- **Parseable as JSON or form data:** SDKs **MAY** extract key-value pairs and apply the same denylist rules to keys. Values for matching keys **MUST** be replaced with `"[Filtered]"`. This allows selective scrubbing while retaining non-sensitive fields for debugging. +- **Not parseable (raw bodies):** The body **MUST NOT** be attached to the event. When the SDK cannot parse the body into key-value structure, the entire body **MUST** be replaced with `"[Filtered]"`. - +No built-in option scrubs **keys**; users who need to hide header or cookie names **MUST** use `beforeSend` (or equivalent). -### Matching Rule + -SDKs **MUST** perform a **partial, case-insensitive match** when comparing header names, cookie names, and query parameter names against the denylist. A key is treated as sensitive if any denylist term appears as a substring in the key (e.g. the term `auth` matches the header names `Authorization` and `X-Auth-Token`). + -### Base Denylist (Sensitive Data) +### User-Set Data and Scrubbing -The following terms **MUST** be included in the default denylist for headers (and **SHOULD** be used for cookies and query params where applicable). A key is sensitive if it **partially matches**, case-insensitively, any of: +When the user **explicitly** sets data on the scope (user, request, response, tags, contexts, etc.) or on a span, log, or other telemetry, that data is **not** gated by `dataCollection`. It **MUST** always be attached to outgoing telemetry. The same applies to data the user provides via `beforeSend` or event processors. -`["auth", "token", "secret", "password", "passwd", "pwd", "key", "jwt", "bearer", "sso", "saml", "csrf", "xsrf", "credentials", "session", "sid", "identity"]` +SDKs **SHOULD** only replace sensitive values with `"[Filtered]"` when the data is gathered **automatically** through instrumentation. If the user explicitly provides data (e.g. by setting a request object on the scope), the SDK **MUST NOT** modify it; the user is responsible for what they attach. -Values for keys that match **MUST** be replaced with `"[Filtered]"`. +Users can register callbacks (e.g. `beforeSend`, event processors) to remove or redact any data — including keys — before events are sent. This spec does not replace those hooks; they remain the mechanism for custom filtering and key removal. -### PII Denylist (when `includeUserInfo` is `false`) + -When `includeUserInfo` is `false`, SDKs **MUST** apply the base denylist **and** treat the following as sensitive (in addition to the list above): +--- -- Any data that contains the following: email, user ID, IP address, username, machine name (if applicable) -- Any header or key containing **`x-forwarded-`** (e.g. `x-forwarded-for`, `x-forwarded-host`) — often carries client IP or host. -- Any header or key whose name ends with or contains **`-user`** (e.g. `x-user-id`, `remote-user`) — often carries user identifiers. +## Public API -So the effective denylist when PII is disabled is: base list + `["x-forwarded-", "-user"]` (substring match, case-insensitive). +The `dataCollection` option is passed to the SDK's init function. All fields are optional; omitting a field uses the default. -### Cookies and Cookie Headers -- SDKs **SHOULD** maintain a default denylist of cookie names using the same matching rule (e.g. `session`, `auth`, `identity`). Values for matching cookie names **MUST** be replaced with `"[Filtered]"`. -- **When individual cookie key-value pairs cannot be extracted** (e.g. malformed or opaque cookie header), the entire `Cookie` or `Set-Cookie` header value **MUST** be replaced with `"[Filtered]"`. Unfiltered raw cookie header values **MUST NOT** be sent. When in doubt, treat the whole cookie header as sensitive. +```pseudocode +init({ + dataCollection: { + includeUserInfo: boolean, // default: false + collect: { + cookies: Collection, // default: true + httpHeaders: Collection, // default: true + queryParams: Collection, // default: true + incomingRequestBody: boolean, // default: TBD + outgoingRequestBody: boolean, // default: TBD + aiAgentMessages: boolean, // default: true + stackFrameVariables: boolean, // default: true + frameContextLines: number, // default: 5 (boolean fallback: true) + }, + }, +}) +``` -### Request Bodies +### `dataCollection.includeUserInfo` -When request or response bodies are collected (`incomingRequestBody` / `outgoingRequestBody`): +| Property | Value | +|----------|-------| +| Type | Boolean | +| Default | `false` | +| Since | 1.0.0 | +| Description | Primary PII toggle. Enables automatic collection of user identity fields (`user.id`, `user.email`, `user.username`, `user.ip_address`). Also sets the effective default for PII-heavy `collect` options. | -- **Parseable as JSON or form data:** SDKs **MAY** extract key-value pairs and apply the same denylist rules (partial, case-insensitive match) to keys. Values for matching keys **MUST** be replaced with `"[Filtered]"`. This allows selective scrubbing while retaining non-sensitive fields for debugging. -- **Raw bodies (not parseable as JSON or FormData):** The body **MUST** be removed and **MUST NOT** be attached to the event. "Raw" HTTP bodies (e.g. binary payloads, plain text, or unparsable content) are never sent through automatic instrumentation. When the SDK cannot parse the body into key-value structure, the entire body **MUST** be replaced with `"[Filtered]"`. +### `dataCollection.collect` Options -No built-in option scrubs **keys**; users who need to hide header or cookie names **MUST** use `beforeSend` (or equivalent). +| Key | Option Type | Default | Since | Description | +|-----|-------------|---------|-------|-------------| +| `cookies` | Collection | `true` | 1.0.0 | Include cookie values; keys filtered by the default denylist or by allow/deny lists. | +| `httpHeaders` | Collection | `true` | 1.0.0 | Include HTTP header values; keys filtered by the default denylist or by allow/deny lists. | +| `queryParams` | Collection | `true` | 1.0.0 | Include URL query parameter values; keys filtered by the default denylist or by allow/deny lists. | +| `incomingRequestBody` | Boolean | TBD | 1.0.0 | Include full body of the incoming HTTP request. | +| `outgoingRequestBody` | Boolean | TBD | 1.0.0 | Include full body of outgoing HTTP requests. | +| `aiAgentMessages` | Boolean | `true` | 1.0.0 | Include AI agent input and output messages. | +| `stackFrameVariables` | Boolean | `true` | 1.0.0 | Include local variable values captured within stack frames. | +| `frameContextLines` | Number (Boolean fallback) | `5` (`true`) | 1.0.0 | Number of lines of context to include around stack frames. | -
+ + Unlike cookies or headers, some data (e.g. request bodies) has no predictable key structure for the SDK to filter. Data can still be redacted in `beforeSend` or event processors if needed. + --- -## Use Cases - -The following examples show how `dataCollection` maps to common configurations. +## Examples - +### Default Configuration -### Maximum PII (full collection) +An explicit representation of all defaults (with `includeUserInfo: false`): -When the user enables full PII collection: +```typescript +init({ + dsn: "...", + dataCollection: { + includeUserInfo: false, + collect: { + cookies: true, + httpHeaders: true, + queryParams: true, + incomingRequestBody: false, + outgoingRequestBody: false, + aiAgentMessages: true, + stackFrameVariables: true, + frameContextLines: 5, + }, + }, +}); +``` -- `includeUserInfo: true` -- `collect`: all collection options `true` (default denylist); `incomingRequestBody`, `outgoingRequestBody`, and `aiAgentMessages` are `true`. +### Maximum PII (Full Collection) -**Result:** Technical context and request/response data (headers, cookies, query params) are collected with the default denylist; request bodies, user identifiers, and AI agent messages are included; sensitive values are still replaced with `"[Filtered]"`. +Enable full PII collection, including request bodies and AI messages: - +```typescript +init({ + dsn: "...", + dataCollection: { + includeUserInfo: true, + collect: { + incomingRequestBody: true, + outgoingRequestBody: true, + }, + }, +}); +``` - +**Result:** Technical context and request/response data (headers, cookies, query params) are collected with the default denylist; request bodies, user identifiers, and AI agent messages are included; sensitive values are still replaced with `"[Filtered]"`. ### Granular Debugging -The user wants to include user info and only specific headers for debugging, and does not want to send query params at all: +Include user info and only specific headers for debugging; exclude query params entirely: ```typescript init({ @@ -295,41 +311,13 @@ init({ }); ``` -Because `includeUserInfo` is set, `aiAgentMessages` defaults to `true` unless the user explicitly sets `collect: { aiAgentMessages: false }`. +### Migration from `sendDefaultPii` - - - - -### Migration from sendDefaultPii - -- **`sendDefaultPii: true`** (legacy) → `dataCollection: { includeUserInfo: true }` and keep `collect` defaults. -- **`sendDefaultPii: false`** (legacy) → `dataCollection: { includeUserInfo: false }` (or omit; same as default). +- **`sendDefaultPii: true`** (legacy) → `dataCollection: { includeUserInfo: true }`, keep `collect` defaults +- **`sendDefaultPii: false`** (legacy) → `dataCollection: { includeUserInfo: false }` (or omit entirely — same as default) SDKs **SHOULD** document this mapping and **MAY** implement `send_default_pii` as a compatibility shim that sets `includeUserInfo`. - - ---- - -## User-Set Data and Scrubbing - - - -### Data set by the user - -When the user **explicitly** sets data on the scope (user, request, response, tags, contexts, etc.) or on a span, log, or other telemetry, that data is **not** gated by `dataCollection`. It **MUST** always be attached to outgoing telemetry. The same applies to data the user provides via `beforeSend` or event processors (e.g. attaching a request object). - -### Automatic vs explicit data - -SDKs **SHOULD** only replace sensitive values with `"[Filtered]"` when the data is gathered **automatically** through instrumentation. If the user explicitly provides data (e.g. by setting a request object on the scope), the SDK **MUST NOT** modify it; the user is responsible for what they attach. - -### beforeSend and event processors - -Users can register callbacks (e.g. `beforeSend`, event processors) to remove or redact any data — including keys — before events are sent. This spec does not replace those hooks; they remain the mechanism for custom filtering and key removal. - - - --- ## Changelog From 88bcbaeef243128ed78ae8a760930d5f044372fe Mon Sep 17 00:00:00 2001 From: s1gr1d <32902192+s1gr1d@users.noreply.github.com> Date: Fri, 6 Mar 2026 15:29:23 +0100 Subject: [PATCH 5/7] explain pii info --- develop-docs/sdk/foundations/client/data-collection/index.mdx | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/client/data-collection/index.mdx b/develop-docs/sdk/foundations/client/data-collection/index.mdx index 8cc3c55938e61..f05135a927dfe 100644 --- a/develop-docs/sdk/foundations/client/data-collection/index.mdx +++ b/develop-docs/sdk/foundations/client/data-collection/index.mdx @@ -45,6 +45,7 @@ Non-identifying context used for debugging and performance: - Device and environment context (OS, runtime, non-PII identifiers) - Performance and error context (stack frames, breadcrumbs, span metadata) - Framework/routing context where it does not contain PII or secrets +- AI agent messages (input, output, metadata) This tier is **not** gated by the data collection configuration. SDKs **MAY** collect it by default. @@ -55,7 +56,7 @@ Personally identifiable or user-linked data: - User identifiers (user ID, username, email) - IP address - Cookies and headers that identify the user or session -- AI agent input and output messages +- HTTP request data (TBD) This tier **MUST** be off by default unless the user opts in via `includeUserInfo` and/or explicit `collect` allowlists. See [`includeUserInfo`](#include-user-info-behavior), [`collect` options](#collect-option-behavior), and [Default Denylist](#default-denylist). @@ -137,6 +138,7 @@ Each key under `collect` maps to a category of automatically collected data and | Option type | Default when `includeUserInfo: true` | Default when `includeUserInfo: false` | |-------------|--------------------------------------|----------------------------------------| | Collection (key-value pairs) | `true` — use default denylist | `true` — use default denylist, plus PII keys denied | +| PII Boolean (e.g. `incomingRequestBody`) | `true` — attach | `false` — do not attach | Non-PII boolean options (e.g. `stackFrameVariables`) are not affected by `includeUserInfo` and always default to their configured value. From 69badc95c3d0c929db4f1b82c45f2d95175d6a7f Mon Sep 17 00:00:00 2001 From: s1gr1d <32902192+s1gr1d@users.noreply.github.com> Date: Fri, 6 Mar 2026 15:30:44 +0100 Subject: [PATCH 6/7] change ordering --- .../sdk/foundations/client/data-collection/index.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/develop-docs/sdk/foundations/client/data-collection/index.mdx b/develop-docs/sdk/foundations/client/data-collection/index.mdx index f05135a927dfe..9ea409289f68c 100644 --- a/develop-docs/sdk/foundations/client/data-collection/index.mdx +++ b/develop-docs/sdk/foundations/client/data-collection/index.mdx @@ -214,10 +214,10 @@ init({ cookies: Collection, // default: true httpHeaders: Collection, // default: true queryParams: Collection, // default: true - incomingRequestBody: boolean, // default: TBD - outgoingRequestBody: boolean, // default: TBD aiAgentMessages: boolean, // default: true stackFrameVariables: boolean, // default: true + incomingRequestBody: boolean, // default: TBD + outgoingRequestBody: boolean, // default: TBD frameContextLines: number, // default: 5 (boolean fallback: true) }, }, @@ -240,10 +240,10 @@ init({ | `cookies` | Collection | `true` | 1.0.0 | Include cookie values; keys filtered by the default denylist or by allow/deny lists. | | `httpHeaders` | Collection | `true` | 1.0.0 | Include HTTP header values; keys filtered by the default denylist or by allow/deny lists. | | `queryParams` | Collection | `true` | 1.0.0 | Include URL query parameter values; keys filtered by the default denylist or by allow/deny lists. | -| `incomingRequestBody` | Boolean | TBD | 1.0.0 | Include full body of the incoming HTTP request. | -| `outgoingRequestBody` | Boolean | TBD | 1.0.0 | Include full body of outgoing HTTP requests. | | `aiAgentMessages` | Boolean | `true` | 1.0.0 | Include AI agent input and output messages. | | `stackFrameVariables` | Boolean | `true` | 1.0.0 | Include local variable values captured within stack frames. | +| `incomingRequestBody` | Boolean | TBD | 1.0.0 | Include full body of the incoming HTTP request. | +| `outgoingRequestBody` | Boolean | TBD | 1.0.0 | Include full body of outgoing HTTP requests. | | `frameContextLines` | Number (Boolean fallback) | `5` (`true`) | 1.0.0 | Number of lines of context to include around stack frames. | @@ -267,10 +267,10 @@ init({ cookies: true, httpHeaders: true, queryParams: true, - incomingRequestBody: false, - outgoingRequestBody: false, aiAgentMessages: true, stackFrameVariables: true, + incomingRequestBody: false, + outgoingRequestBody: false, frameContextLines: 5, }, }, From ae5020b011602a78cefc7fdf08aa18be02c4d962 Mon Sep 17 00:00:00 2001 From: s1gr1d <32902192+s1gr1d@users.noreply.github.com> Date: Fri, 6 Mar 2026 15:32:09 +0100 Subject: [PATCH 7/7] add ai agent messages to migration --- develop-docs/sdk/foundations/client/data-collection/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/client/data-collection/index.mdx b/develop-docs/sdk/foundations/client/data-collection/index.mdx index 9ea409289f68c..8f158e326c46e 100644 --- a/develop-docs/sdk/foundations/client/data-collection/index.mdx +++ b/develop-docs/sdk/foundations/client/data-collection/index.mdx @@ -315,7 +315,7 @@ init({ ### Migration from `sendDefaultPii` -- **`sendDefaultPii: true`** (legacy) → `dataCollection: { includeUserInfo: true }`, keep `collect` defaults +- **`sendDefaultPii: true`** (legacy) → `dataCollection: { includeUserInfo: true, collect: { aiAgentMessages: false } }`, keep most `collect` defaults - **`sendDefaultPii: false`** (legacy) → `dataCollection: { includeUserInfo: false }` (or omit entirely — same as default) SDKs **SHOULD** document this mapping and **MAY** implement `send_default_pii` as a compatibility shim that sets `includeUserInfo`.