Skip to content

Conversation

@constantinius
Copy link
Contributor

Description

Add more support on message types for gen_ai.request.messages

Issues

Closes: https://linear.app/getsentry/issue/TET-1637/redact-images-langchain

@constantinius constantinius requested a review from a team as a code owner January 5, 2026 19:16
@linear
Copy link

linear bot commented Jan 5, 2026

Base automatically changed from constantinius/fix/redact-message-parts-type-blob to master January 13, 2026 09:56
@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • feat(ai): add parse_data_uri function to parse a data URI by constantinius in #5311
  • feat(asyncio): Add on-demand way to enable AsyncioIntegration by sentrivana in #5288

Bug Fixes 🐛

  • fix(ai): redact message parts content of type blob by constantinius in #5243
  • fix(clickhouse): Guard against module shadowing by alexander-alderman-webb in #5250
  • fix(gql): Revert signature change of patched gql.Client.execute by alexander-alderman-webb in #5289
  • fix(grpc): Derive interception state from channel fields by alexander-alderman-webb in #5302
  • fix(integrations): langchain add multimodal content transformation functions for images, audio, and files by constantinius in #5278
  • fix(litellm): Guard against module shadowing by alexander-alderman-webb in #5249
  • fix(pure-eval): Guard against module shadowing by alexander-alderman-webb in #5252
  • fix(ray): Guard against module shadowing by alexander-alderman-webb in #5254
  • fix(threading): Handle channels shadowing by sentrivana in #5299
  • fix(typer): Guard against module shadowing by alexander-alderman-webb in #5253
  • fix: Send client reports for span recorder overflow by sentrivana in #5310

Documentation 📚

  • docs(metrics): Remove experimental notice by alexander-alderman-webb in #5304
  • docs: Update Python versions banner in README by sentrivana in #5287

Internal Changes 🔧

Release

  • ci(release): Bump Craft version to fix issues by BYK in #5305
  • ci(release): Switch from action-prepare-release to Craft by BYK in #5290

Other

  • chore(gen_ai): add auto-enablement for google genai by shellmayr in #5295
  • chore: Add type for metric units by sentrivana in #5312
  • ci: Update tox and handle generic classifiers by sentrivana in #5306

🤖 This preview updates automatically when you update the PR.

Comment on lines +263 to +264
return {
"type": "blob",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code hardcodes "modality": "image" for Google-style inline_data or file_data when a type field is absent, ignoring the mime_type which could indicate audio or video.
Severity: HIGH

Suggested Fix

Infer the modality from the mime_type when the type field is not present in a Google-style content block. Create a helper function that maps MIME types (e.g., "audio/mp3", "video/mp4") to the correct modality ("audio", "video", etc.). Use "image" as a default only if the MIME type is missing or unrecognized.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: sentry_sdk/integrations/langchain.py#L263-L264

Potential issue: When processing Google-style content blocks (`inline_data` or
`file_data`) that lack a `type` field, the function `_format_content_block` hardcodes
the modality as `"image"`. This occurs even when the `mime_type` field indicates other
content types like audio, video, or documents, which are supported by Google's Gemini
API. The `mime_type` is extracted but not used to infer the correct modality. This will
lead to incorrect data categorization in Sentry, where non-image content from the
LangChain integration will be mislabeled as an image.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +263 to +269
return {
"type": "blob",
"modality": "image",
"mime_type": inline_data.get("mime_type", ""),
"content": inline_data.get("data", ""),
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Google-style content blocks (inline_data, file_data) without an explicit type are always assigned modality: "image", ignoring the actual mime_type for audio or video.
Severity: HIGH

Suggested Fix

Instead of hardcoding modality: "image", derive the modality from the mime_type present in the inline_data or file_data dictionary. A helper function could map MIME type prefixes (e.g., 'audio/', 'video/') to the correct modality ('audio', 'video'), with 'image' as a default.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: sentry_sdk/integrations/langchain.py#L263-L269

Potential issue: The fallback handlers for Google-style content blocks, specifically for
`inline_data` and `file_data`, incorrectly hardcode the `modality` as `"image"`.
According to Google Gemini API documentation, content blocks may not have a `type` field
and rely solely on `mime_type` to determine the content. This means if a block with a
`mime_type` like `"audio/mpeg"` is processed, it will fall through to the handler at
line 260 and be incorrectly categorized with `modality: "image"`. This leads to
incorrect data categorization in Sentry for non-image content like audio or video.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants