Skip to content

feat(mcp): trim wide-table payload in get_entity_details#28776

Open
Vishnuujain wants to merge 3 commits into
fix/mcp-shared-response-trimfrom
fix/mcp-entity-details-trim
Open

feat(mcp): trim wide-table payload in get_entity_details#28776
Vishnuujain wants to merge 3 commits into
fix/mcp-shared-response-trimfrom
fix/mcp-entity-details-trim

Conversation

@Vishnuujain
Copy link
Copy Markdown
Contributor

Stacked on #28764 (uses its McpResponseTrim util) — merge that first; this auto-retargets to main.

What

  • Per-column description truncation at 500 chars (recursive through nested children) — the N× multiplier on wide tables. A single top-level columnDescriptionsTruncated:true marks when anything was cut.
  • schemaDefinition and dataModel.sql/rawSql truncation at 500 chars with flags (same pattern as lineage SQL).
  • Plugs incrementalChangeDescription leak — audit blob was missing from the exclude list (live-verified in real responses).
  • The entity-level description is always returned in full — this is the detail tool; the one place full text must stay reachable after search truncates.
  • Null-guards the RCA rethrow messages (safeMessage).

Notes

  • extension (custom properties, fix(mcp): surface custom properties (extension) in get_entity_details #28594) preserved at table and column level; columns/tags/customMetrics untouched.
  • Operates on the fresh Jackson map tree — never mutates the cached entity POJO.
  • 298 unit tests pass (10 new); live-verified end to end: leak gone, 864-char column description cut at 500 + flag, short descriptions untouched, all other tools regression-clean.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

  • No GitHub issue is linked. Add a closing reference such as Fixes #12345 to the PR description (accepted keywords: Fixes, Closes, Resolves).

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@Vishnuujain Vishnuujain added the safe to test Add this label to run secure Github workflows on PRs label Jun 5, 2026
{
"name": "get_entity_details",
"description": "Get detailed information about a specific entity by its fully qualified name, including its custom properties (returned under the 'extension' field). IMPORTANT: Use the 'fullyQualifiedName' and 'entityType' values directly from search_metadata or semantic_search results — do not construct the FQN manually. Response is optimized for LLM context by excluding verbose metadata fields.",
"description": "Get detailed information about a specific entity by its fully qualified name, including its custom properties (returned under the 'extension' field). IMPORTANT: Use the 'fullyQualifiedName' and 'entityType' values directly from search_metadata or semantic_search results — do not construct the FQN manually. Response is optimized for LLM context: verbose metadata fields are excluded, and per-column descriptions and raw schema/model SQL are truncated at 500 characters (marked with 'columnDescriptionsTruncated' / 'schemaDefinitionTruncated' when cut). The entity-level description is always returned in full.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: tools.json omits dataModel 'sqlTruncated' flag from description

The updated get_entity_details description advertises that truncation is signalled via columnDescriptionsTruncated and schemaDefinitionTruncated, but the code also truncates dataModel.sql/dataModel.rawSql and emits a sqlTruncated flag nested inside dataModel. Since the tool description is what the LLM reads to interpret the response, it won't know that dbt/data-model SQL may be cut or that a sqlTruncated marker exists. Consider mentioning the dataModel.sqlTruncated flag in the tool description so the consuming model can reason about truncated model SQL the same way it does for schema DDL.

Mention the dataModel SQL truncation flag in the tool description.:

...truncated at 500 characters (marked with 'columnDescriptionsTruncated' / 'schemaDefinitionTruncated', or 'sqlTruncated' inside 'dataModel', when cut). The entity-level description is always returned in full.
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

@Vishnuujain Vishnuujain added the To release Will cherry-pick this PR into the release branch label Jun 5, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jun 5, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Implements recursive payload truncation for wide tables in get_entity_details to prevent token bloat, ensuring core descriptions remain intact. Please update the tools.json schema documentation to include the missing sqlTruncated flag.

💡 Quality: tools.json omits dataModel 'sqlTruncated' flag from description

📄 openmetadata-mcp/src/main/resources/json/data/mcp/tools.json:185 📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/GetEntityTool.java:141-150

The updated get_entity_details description advertises that truncation is signalled via columnDescriptionsTruncated and schemaDefinitionTruncated, but the code also truncates dataModel.sql/dataModel.rawSql and emits a sqlTruncated flag nested inside dataModel. Since the tool description is what the LLM reads to interpret the response, it won't know that dbt/data-model SQL may be cut or that a sqlTruncated marker exists. Consider mentioning the dataModel.sqlTruncated flag in the tool description so the consuming model can reason about truncated model SQL the same way it does for schema DDL.

Mention the dataModel SQL truncation flag in the tool description.
...truncated at 500 characters (marked with 'columnDescriptionsTruncated' / 'schemaDefinitionTruncated', or 'sqlTruncated' inside 'dataModel', when cut). The entity-level description is always returned in full.
🤖 Prompt for agents
Code Review: Implements recursive payload truncation for wide tables in `get_entity_details` to prevent token bloat, ensuring core descriptions remain intact. Please update the `tools.json` schema documentation to include the missing `sqlTruncated` flag.

1. 💡 Quality: tools.json omits dataModel 'sqlTruncated' flag from description
   Files: openmetadata-mcp/src/main/resources/json/data/mcp/tools.json:185, openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/GetEntityTool.java:141-150

   The updated `get_entity_details` description advertises that truncation is signalled via `columnDescriptionsTruncated` and `schemaDefinitionTruncated`, but the code also truncates `dataModel.sql`/`dataModel.rawSql` and emits a `sqlTruncated` flag nested inside `dataModel`. Since the tool description is what the LLM reads to interpret the response, it won't know that dbt/data-model SQL may be cut or that a `sqlTruncated` marker exists. Consider mentioning the `dataModel.sqlTruncated` flag in the tool description so the consuming model can reason about truncated model SQL the same way it does for schema DDL.

   Fix (Mention the dataModel SQL truncation flag in the tool description.):
   ...truncated at 500 characters (marked with 'columnDescriptionsTruncated' / 'schemaDefinitionTruncated', or 'sqlTruncated' inside 'dataModel', when cut). The entity-level description is always returned in full.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔴 Playwright Results — 3 failure(s), 12 flaky

✅ 4263 passed · ❌ 3 failed · 🟡 12 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 299 0 2 4
🔴 Shard 2 799 2 3 9
🔴 Shard 3 809 1 1 8
🟡 Shard 4 844 0 3 12
🟡 Shard 5 720 0 1 47
🟡 Shard 6 792 0 2 8

Genuine Failures (failed on all attempts)

Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="StatusBadgeTerm1780689027109"]').locator('.status-badge')
Expected: �[32m"In Review"�[39m
Received: �[31m"Draft"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="StatusBadgeTerm1780689027109"]').locator('.status-badge')�[22m
�[2m    18 × locator resolved to <div class="status-badge pending" data-testid=""PW%'3eaa6a53.Bold9c3fe71f".StatusBadgeTerm1780689027109-status">…</div>�[22m
�[2m       - unexpected value "Draft"�[22m

Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="DraftTerm1780689093437"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="DraftTerm1780689093437"]').locator('.status-badge')�[22m
�[2m    18 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'73910900.Silly58d6fca7".DraftTerm1780689093437-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

Flow/ExploreAggregationCountsMatching.spec.ts › should verify left panel counts and tab search results for normal search (shard 3)
Error: Tab "table" search total hits should match the aggregation count

�[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoBe�[2m(�[22m�[32mexpected�[39m�[2m) // Object.is equality�[22m

Expected: �[32m24�[39m
Received: �[31m158�[39m
🟡 12 flaky test(s) (passed on retry)
  • Features/EntityRenameConsolidation.spec.ts › Classification - multiple rename + update cycles should preserve tags (shard 1, 1 retry)
  • Features/MutuallyExclusiveColumnTags.spec.ts › Should show error toast when adding mutually exclusive tags to column (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/BulkImport.spec.ts › Keyboard Delete selection (shard 2, 1 retry)
  • Features/ColumnBulkOperations.spec.ts › should show Service filter chip from URL (shard 2, 1 retry)
  • Features/IncidentManager.spec.ts › Resolving incident & re-run pipeline (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Integer (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/DataProducts.spec.ts › Empty State - No Data Products (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for mlModel -> searchIndex (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant