fix(integrations): langchain add multimodal content transformation functions for images, audio, and files #5278

constantinius · 2026-01-05T19:16:37Z

Description

Add more support on message types for gen_ai.request.messages

Issues

Closes: https://linear.app/getsentry/issue/TET-1637/redact-images-langchain

…nctions for images, audio, and files

linear · 2026-01-05T19:16:41Z

TET-1637 Redact images: Langchain

sentry_sdk/integrations/langchain.py

…eport-binary-data

github-actions · 2026-01-13T12:51:14Z

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).

New Features ✨

feat(ai): add parse_data_uri function to parse a data URI by constantinius in #5311
feat(asyncio): Add on-demand way to enable AsyncioIntegration by sentrivana in #5288

Bug Fixes 🐛

fix(ai): redact message parts content of type blob by constantinius in #5243
fix(clickhouse): Guard against module shadowing by alexander-alderman-webb in #5250
fix(gql): Revert signature change of patched gql.Client.execute by alexander-alderman-webb in #5289
fix(grpc): Derive interception state from channel fields by alexander-alderman-webb in #5302

fix(integrations): langchain add multimodal content transformation functions for images, audio, and files by constantinius in #5278

fix(litellm): Guard against module shadowing by alexander-alderman-webb in #5249
fix(pure-eval): Guard against module shadowing by alexander-alderman-webb in #5252
fix(ray): Guard against module shadowing by alexander-alderman-webb in #5254
fix(threading): Handle channels shadowing by sentrivana in #5299
fix(typer): Guard against module shadowing by alexander-alderman-webb in #5253
fix: Send client reports for span recorder overflow by sentrivana in #5310

Documentation 📚

docs(metrics): Remove experimental notice by alexander-alderman-webb in #5304
docs: Update Python versions banner in README by sentrivana in #5287

Internal Changes 🔧

Release

ci(release): Bump Craft version to fix issues by BYK in #5305
ci(release): Switch from action-prepare-release to Craft by BYK in #5290

Other

chore(gen_ai): add auto-enablement for google genai by shellmayr in #5295
chore: Add type for metric units by sentrivana in #5312
ci: Update tox and handle generic classifiers by sentrivana in #5306

_{🤖 This preview updates automatically when you update the PR.}

sentry_sdk/integrations/langchain.py

…tive content formats

sentry_sdk/integrations/langchain.py

…eport-binary-data

…ats and use common function for data URI parsing

sentry · 2026-01-14T15:55:18Z

sentry_sdk/integrations/langchain.py

+            return {
+                "type": "blob",


Bug: The code hardcodes "modality": "image" for Google-style inline_data or file_data when a type field is absent, ignoring the mime_type which could indicate audio or video.
_{Severity: HIGH}

Suggested Fix

Infer the modality from the mime_type when the type field is not present in a Google-style content block. Create a helper function that maps MIME types (e.g., "audio/mp3", "video/mp4") to the correct modality ("audio", "video", etc.). Use "image" as a default only if the MIME type is missing or unrecognized.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: sentry_sdk/integrations/langchain.py#L263-L264 Potential issue: When processing Google-style content blocks (`inline_data` or `file_data`) that lack a `type` field, the function `_format_content_block` hardcodes the modality as `"image"`. This occurs even when the `mime_type` field indicates other content types like audio, video, or documents, which are supported by Google's Gemini API. The `mime_type` is extracted but not used to infer the correct modality. This will lead to incorrect data categorization in Sentry, where non-image content from the LangChain integration will be mislabeled as an image.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2026-01-14T15:57:06Z

sentry_sdk/integrations/langchain.py

+            return {
+                "type": "blob",
+                "modality": "image",
+                "mime_type": inline_data.get("mime_type", ""),
+                "content": inline_data.get("data", ""),
+            }
+


Bug: Google-style content blocks (inline_data, file_data) without an explicit type are always assigned modality: "image", ignoring the actual mime_type for audio or video.
_{Severity: HIGH}

Suggested Fix

Instead of hardcoding modality: "image", derive the modality from the mime_type present in the inline_data or file_data dictionary. A helper function could map MIME type prefixes (e.g., 'audio/', 'video/') to the correct modality ('audio', 'video'), with 'image' as a default.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: sentry_sdk/integrations/langchain.py#L263-L269 Potential issue: The fallback handlers for Google-style content blocks, specifically for `inline_data` and `file_data`, incorrectly hardcode the `modality` as `"image"`. According to Google Gemini API documentation, content blocks may not have a `type` field and rely solely on `mime_type` to determine the content. This means if a block with a `mime_type` like `"audio/mpeg"` is processed, it will fall through to the handler at line 260 and be incorrectly categorized with `modality: "image"`. This leads to incorrect data categorization in Sentry for non-image content like audio or video.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

constantinius added 5 commits December 17, 2025 10:45

fix(ai): redact message parts content of type blob

1f32952

fix(ai): skip non dict messages

795bcea

fix(ai): typing

a623e13

fix(ai): content items may not be dicts

3d3ce5b

fix(integrations): langchain add multimodal content transformation fu…

c606b66

…nctions for images, audio, and files

constantinius requested a review from a team as a code owner January 5, 2026 19:16

cursor bot reviewed Jan 5, 2026

View reviewed changes

sentry_sdk/integrations/langchain.py Outdated Show resolved Hide resolved

fix(integrations): ensure URL check for data URIs handles empty strings

c650799

Base automatically changed from constantinius/fix/redact-message-parts-type-blob to master January 13, 2026 09:56

Merge branch 'master' into constantinius/fix/integrations/langchain-r…

71f2084

…eport-binary-data

alexander-alderman-webb reviewed Jan 14, 2026

View reviewed changes

sentry_sdk/integrations/langchain.py Show resolved Hide resolved

alexander-alderman-webb reviewed Jan 14, 2026

View reviewed changes

sentry_sdk/integrations/langchain.py Outdated Show resolved Hide resolved

fix(integrations): Langchain: Handle Anthropic and Google provider-na…

510e2ed

…tive content formats

cursor bot reviewed Jan 14, 2026

View reviewed changes

sentry_sdk/integrations/langchain.py Show resolved Hide resolved

constantinius added 2 commits January 14, 2026 16:45

Merge branch 'master' into constantinius/fix/integrations/langchain-r…

e76dddd

…eport-binary-data

fix(integrations): Use correct modality for Google-style content form…

1764e57

…ats and use common function for data URI parsing

constantinius requested a review from alexander-alderman-webb January 14, 2026 15:51

sentry bot reviewed Jan 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(integrations): langchain add multimodal content transformation functions for images, audio, and files #5278

fix(integrations): langchain add multimodal content transformation functions for images, audio, and files #5278

constantinius commented Jan 5, 2026

Uh oh!

linear bot commented Jan 5, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sentry bot Jan 14, 2026

Uh oh!

sentry bot Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(integrations): langchain add multimodal content transformation functions for images, audio, and files #5278

Are you sure you want to change the base?

fix(integrations): langchain add multimodal content transformation functions for images, audio, and files #5278

Conversation

constantinius commented Jan 5, 2026

Description

Issues

Uh oh!

linear bot commented Jan 5, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Semver Impact of This PR

📋 Changelog Preview

New Features ✨

Bug Fixes 🐛

Documentation 📚

Internal Changes 🔧

Release

Other

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sentry bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jan 13, 2026 •

edited

Loading