Skip to content

docs(sdks): Add spec for dataCollection option to supersede sendDefaultPii#16796

Open
s1gr1d wants to merge 7 commits intomasterfrom
sig/data-collection-options
Open

docs(sdks): Add spec for dataCollection option to supersede sendDefaultPii#16796
s1gr1d wants to merge 7 commits intomasterfrom
sig/data-collection-options

Conversation

@s1gr1d
Copy link
Member

@s1gr1d s1gr1d commented Mar 5, 2026

This PR extends the Data Collection spec so it is the single place for what SDKs collect and how they scrub it. It adds concrete denylist behavior, request-body and cookie rules, and pulls in the relevant scrubbing behavior from the Data Handling doc.

IS YOUR CHANGE URGENT?

Help us prioritize incoming PRs by letting us know when the change needs to go live.

  • Urgent deadline (GA date, etc.):
  • Other deadline:
  • None: Not urgent, can wait up to 1 week+

SLA

  • Teamwork makes the dream work, so please add a reviewer to your PRs.
  • Please give the docs team up to 1 week to review your PR unless you've added an urgent due date to it.
    Thanks in advance for your help!

PRE-MERGE CHECKLIST

Make sure you've checked the following before merging your changes:

  • Checked Vercel preview for correctness, including links
  • PR was reviewed and approved by any necessary SMEs (subject matter experts)
  • PR was reviewed and approved by a member of the Sentry docs team

@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
develop-docs Ready Ready Preview, Comment Mar 9, 2026 9:38am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
sentry-docs Ignored Ignored Preview Mar 9, 2026 9:38am

Request Review

@s1gr1d s1gr1d changed the title docs(sdks): Add docs on dataCollection docs(sdks): Add spec for dataCollection option to supersede sendDefaultPii Mar 5, 2026
@s1gr1d s1gr1d requested a review from cleptric March 5, 2026 13:36
@s1gr1d s1gr1d force-pushed the sig/data-collection-options branch from 441172f to e99813b Compare March 5, 2026 13:58
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page does not exist anymore - we moved the PII related content to
https://develop.sentry.dev/sdk/foundations/data-scrubbing/

@dingsdax dingsdax added the sdk-develop-docs PRs touching develop-docs/sdk label Mar 9, 2026
@cleptric
Copy link
Member

cleptric commented Mar 9, 2026

We should include some more general information, such as how we expect the default snippet too like now, which is very minimal and in-line with the behavior of sendDefualtPii.

sentry.init({
  dsn: '...',
})

// or with user information

sentry.init({
  dsn: '...',
  dataCollection: {
    includeUserInfo: true,     
  }
})

and also mention the reasoning of this change. In particular, we should highlight that we now include more context by default, without a change in our position to be privacy first.

Copy link
Contributor

@itaybre itaybre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make sense, but got some questions here


The following terms **MUST** be included in the default denylist for headers, and **SHOULD** be applied to cookies and query params where applicable:

`["auth", "token", "secret", "password", "passwd", "pwd", "key", "jwt", "bearer", "sso", "saml", "csrf", "xsrf", "credentials", "session", "sid", "identity"]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: We have some additional filtered headers on cocoa that may be relevant here (https://github.com/getsentry/sentry-cocoa/blob/main/Sources/Swift/Core/Tools/HTTPHeaderSanitizer.swift#L8): X-REAL-IP and REMOTE-ADDR

- SDKs **SHOULD** maintain a default denylist of cookie names using the same matching rule (e.g. `session`, `auth`, `identity`). Values for matching cookie names **MUST** be replaced with `"[Filtered]"`.
- **When individual cookie key-value pairs cannot be extracted** (e.g. malformed or opaque cookie string), the entire `Cookie` or `Set-Cookie` header value **MUST** be replaced with `"[Filtered]"`. Unfiltered raw cookie header values **MUST NOT** be sent. When in doubt, treat the whole cookie header as sensitive.

#### Request Bodies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: Should the same apply for response bodies? This is (or will be, depends on the SDK) being recorded now for Session Replay

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is set in SessionReplay configuration, it may be worth aligning there


### User-Set Data and Scrubbing

When the user **explicitly** sets data on the scope (user, request, response, tags, contexts, etc.) or on a span, log, or other telemetry, that data is **not** gated by `dataCollection`. It **MUST** always be attached to outgoing telemetry. The same applies to data the user provides via `beforeSend` or event processors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification 👍

- User identifiers (user ID, username, email)
- IP address
- Cookies and headers that identify the user or session
- HTTP request data (TBD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: What about request paths?
Some requests may be identifiable, like /user/USER_ID
Should we have a denylist/allowlist for url paths?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question 🤔
What we currently do in JS: When we either know it's a param route, we use the appropriate parametrized route name (e.g. user/:id) as the transaction name but the full URL (e.g. user/123) is still added in the attributes. @cleptric Any opinions on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sdk-develop-docs PRs touching develop-docs/sdk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants