Skip to content

Add include_metadata request parameter for PPL queries #5235#5412

Open
ishag4 wants to merge 3 commits into
opensearch-project:mainfrom
ishag4:issue-5235
Open

Add include_metadata request parameter for PPL queries #5235#5412
ishag4 wants to merge 3 commits into
opensearch-project:mainfrom
ishag4:issue-5235

Conversation

@ishag4
Copy link
Copy Markdown

@ishag4 ishag4 commented May 6, 2026

Description

Add a request-level parameter include_metadata to the PPL query API:

POST /_plugins/_ppl?include_metadata=true
{
"query": "source=logs | where level='ERROR' | fields * | head 10"
}
Result: All regular fields PLUS metadata fields (_id, _index, _score, etc.)

Related Issues

Resolves #5235

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Reviewer Guide 🔍

(Review updated until commit 05e612c)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Logic Error

When include_metadata=true, the code sets isProjectVisited=true but then calls tryToRemoveMetaFields with excludeByForce=false. However, tryToRemoveMetaFields checks context.isIncludeMetadata() first and returns early if true AND excludeByForce is false. This means the isProjectVisited flag has no effect in this path. The logic appears inverted: for AllFields (include_metadata=true), metadata should remain, but the code structure suggests confusion about when to preserve vs. remove metadata.

if (allFields instanceof AllFieldsExcludeMeta) {
  // For AllFieldsExcludeMeta (include_metadata=false), remove nested fields and force exclude
  // metadata
  tryToRemoveNestedFields(context);
  tryToRemoveMetaFields(context, true); // Force exclude metadata fields
} else {
  // For AllFields (include_metadata=true), include metadata fields
  tryToRemoveNestedFields(context);
  // Mark as project visited to prevent automatic metadata field removal
  context.setProjectVisited(true);
  // Don't force exclude metadata fields - let them remain
  tryToRemoveMetaFields(context, false);
}
Misleading Comment

Comment states "Legacy engine always includes basic metadata (schema information)" and "The includeMetadata flag doesn't affect legacy engine behavior since it already provides column names, types, and aliases in the schema". This conflates schema metadata (column names/types) with document metadata fields (_id, _index, _score). The feature is about document metadata fields, not schema information. This comment may confuse future maintainers about what includeMetadata controls.

// Legacy engine always includes basic metadata (schema information)
// The includeMetadata flag doesn't affect legacy engine behavior since
// it already provides column names, types, and aliases in the schema
executeWithLegacy(plan, queryType, listener, Optional.empty());

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Code Suggestions ✨

Latest suggestions up to 05e612c

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent unintended metadata field removal

The logic for handling AllFields (include_metadata=true) calls
tryToRemoveMetaFields(context, false) which may still remove metadata fields based
on the isProjectVisited flag. This could lead to inconsistent behavior where
metadata fields are removed even when include_metadata=true. Consider not calling
tryToRemoveMetaFields at all for the AllFields case to ensure metadata fields are
preserved.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [497-509]

 if (allFields instanceof AllFieldsExcludeMeta) {
   // For AllFieldsExcludeMeta (include_metadata=false), remove nested fields and force exclude
   // metadata
   tryToRemoveNestedFields(context);
   tryToRemoveMetaFields(context, true); // Force exclude metadata fields
 } else {
   // For AllFields (include_metadata=true), include metadata fields
   tryToRemoveNestedFields(context);
   // Mark as project visited to prevent automatic metadata field removal
   context.setProjectVisited(true);
-  // Don't force exclude metadata fields - let them remain
-  tryToRemoveMetaFields(context, false);
+  // Don't call tryToRemoveMetaFields to ensure metadata fields remain
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential issue where calling tryToRemoveMetaFields(context, false) for AllFields may still remove metadata fields based on the isProjectVisited flag. The improved code removes this call to ensure metadata fields are preserved when include_metadata=true, which aligns with the intended behavior.

Medium
General
Verify backward compatibility of default value

The overloaded method defaults includeMetadata to false, which may break backward
compatibility for existing callers who expect metadata fields to be included. Verify
that this default aligns with the intended behavior for all existing call sites, or
consider preserving the previous behavior for backward compatibility.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [99-105]

 public void execute(
     UnresolvedPlan plan,
     QueryType queryType,
     HighlightConfig highlightConfig,
     ResponseListener<ExecutionEngine.QueryResponse> listener) {
+  // Preserve backward compatibility by defaulting to false
+  // Verify this aligns with expected behavior for existing callers
   execute(plan, queryType, highlightConfig, false, listener);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion asks to verify backward compatibility when defaulting includeMetadata to false. While this is a valid concern, the PR's default behavior of excluding metadata fields is intentional and documented. The suggestion adds a comment but doesn't change functionality, making it a verification request rather than a code fix.

Low

Previous suggestions

Suggestions up to commit 8b3962e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove metadata field filtering call

The logic for handling AllFields (include_metadata=true) calls
tryToRemoveMetaFields(context, false) which may still remove metadata fields based
on the isProjectVisited check. This could lead to inconsistent behavior where
metadata fields are removed even when include_metadata=true. Consider removing the
call to tryToRemoveMetaFields entirely for the AllFields case to ensure metadata
fields are always preserved.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [497-509]

 if (allFields instanceof AllFieldsExcludeMeta) {
   // For AllFieldsExcludeMeta (include_metadata=false), remove nested fields and force exclude
   // metadata
   tryToRemoveNestedFields(context);
   tryToRemoveMetaFields(context, true); // Force exclude metadata fields
 } else {
   // For AllFields (include_metadata=true), include metadata fields
   tryToRemoveNestedFields(context);
   // Mark as project visited to prevent automatic metadata field removal
   context.setProjectVisited(true);
-  // Don't force exclude metadata fields - let them remain
-  tryToRemoveMetaFields(context, false);
+  // Don't call tryToRemoveMetaFields at all - preserve all metadata fields
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential issue where tryToRemoveMetaFields(context, false) could still remove metadata fields even when include_metadata=true, depending on the isProjectVisited check. Removing this call ensures consistent behavior for metadata inclusion.

Medium
General
Document metadata parameter limitation in fallback

When falling back to the legacy engine, the includeMetadata parameter is not passed
to executeWithLegacy. This means the user's metadata preference is lost during
fallback, potentially causing inconsistent behavior. Verify that the legacy engine
respects the metadata setting or document this limitation.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [216-222]

 } catch (Throwable t) {
   if (isCalciteFallbackAllowed(t) && !(t instanceof NonFallbackCalciteException)) {
     log.warn("Fallback to V2 query engine since got exception", t);
-    // Legacy engine provides basic metadata support, so fallback is acceptable
+    // Note: Legacy engine doesn't support include_metadata parameter
+    // User's metadata preference may not be honored in fallback
     executeWithLegacy(plan, queryType, listener, Optional.of(t));
   } else {
     propagateCalciteError(t, listener);
   }
Suggestion importance[1-10]: 6

__

Why: Valid observation that the includeMetadata parameter is not passed to executeWithLegacy. However, the PR already includes a comment explaining that the legacy engine provides basic metadata support. The suggestion to document this limitation more explicitly is helpful but has moderate impact.

Low
Verify metadata preservation with pagination

When pagination is used, the includeMetadata flag is passed to queryService.execute
with a Paginate wrapper. Verify that the Paginate plan node correctly preserves and
propagates the metadata inclusion behavior through the query execution pipeline, as
pagination may affect field projection logic.

core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java [106-117]

 public void execute() {
   if (pageSize.isPresent()) {
+    // Ensure Paginate wrapper preserves includeMetadata behavior
     queryService.execute(
         new Paginate(pageSize.get(), plan),
         getQueryType(),
         highlightConfig,
         includeMetadata,
         listener);
   } else {
     queryService.execute(plan, getQueryType(), highlightConfig, includeMetadata, listener);
   }
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion asks to verify that Paginate preserves includeMetadata behavior. While this is a valid concern for testing, the code correctly passes the includeMetadata flag through the execution pipeline. The suggestion is more about verification than a code issue.

Low
Suggestions up to commit b79855b
CategorySuggestion                                                                                                                                    Impact
General
Verify metadata preservation logic

The logic appears inverted. When includeMetadata is true and not forced to exclude,
metadata fields should be retained, but the method tryToRemoveMetaFields is designed
to remove them. Verify that returning early here correctly preserves metadata fields
instead of removing them.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [658-660]

+// When include_metadata=true and not forced, skip metadata removal to preserve them
 if (context.isIncludeMetadata() && !excludeByForce) {
-  return;
+  return; // Correctly preserves metadata by not removing
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion asks to verify the logic for metadata preservation, which is reasonable given the complexity of the conditional. However, the logic appears correct: when includeMetadata is true and not forced to exclude, the method returns early to skip removal, thus preserving metadata. The suggestion is valid for verification purposes but doesn't identify an actual bug.

Medium
Suggestions up to commit 85f18b2
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add getter for includeMetadata field

The includeMetadata field is declared as private final but lacks a corresponding
getter method. This prevents external classes from accessing the metadata inclusion
setting, which is needed by query execution components. Add a public getter method
isIncludeMetadata() to expose this field.

core/src/main/java/org/opensearch/sql/ast/statement/Query.java [24-30]

 protected final UnresolvedPlan plan;
 protected final int fetchSize;
 private final QueryType queryType;
 private final boolean includeMetadata;
 private HighlightConfig highlightConfig;
 
+public boolean isIncludeMetadata() {
+  return includeMetadata;
+}
+
Suggestion importance[1-10]: 9

__

Why: The includeMetadata field is declared as private final without a getter method, which prevents external classes from accessing this critical configuration. The PR shows usage of node.isIncludeMetadata() in QueryPlanFactory.java (lines 119, 132), indicating this getter is essential for the feature to work correctly.

High
General
Remove contradictory metadata field removal call

The logic for handling AllFields (when include_metadata=true) calls
tryToRemoveMetaFields(context, false) which may still remove metadata fields based
on the isProjectVisited() check. This contradicts the intent to include metadata.
Consider not calling tryToRemoveMetaFields at all when metadata should be included,
or ensure the method respects the includeMetadata flag.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [497-509]

 if (allFields instanceof AllFieldsExcludeMeta) {
   // For AllFieldsExcludeMeta (include_metadata=false), remove nested fields and force exclude
   // metadata
   tryToRemoveNestedFields(context);
   tryToRemoveMetaFields(context, true); // Force exclude metadata fields
 } else {
   // For AllFields (include_metadata=true), include metadata fields
   tryToRemoveNestedFields(context);
   // Mark as project visited to prevent automatic metadata field removal
   context.setProjectVisited(true);
-  // Don't force exclude metadata fields - let them remain
-  tryToRemoveMetaFields(context, false);
+  // Don't remove metadata fields when include_metadata=true
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that calling tryToRemoveMetaFields(context, false) in the AllFields branch may still remove metadata fields based on isProjectVisited() check. However, the improved code removes the call entirely without verifying if this breaks other functionality. The logic needs careful review to ensure metadata fields are properly retained.

Medium
Warn when includeMetadata ignored by legacy

The includeMetadata parameter is silently ignored when using the legacy engine,
which may lead to inconsistent behavior between Calcite and legacy execution paths.
Consider logging a warning when includeMetadata=true is specified but the legacy
engine is used, or document this limitation clearly to avoid user confusion.

core/src/main/java/org/opensearch/sql/executor/QueryService.java [115-120]

 } else {
+  if (includeMetadata) {
+    log.warn("include_metadata parameter is not supported by legacy engine and will be ignored");
+  }
   // Legacy engine always includes basic metadata (schema information)
   // The includeMetadata flag doesn't affect legacy engine behavior since
   // it already provides column names, types, and aliases in the schema
   executeWithLegacy(plan, queryType, listener, Optional.empty());
 }
Suggestion importance[1-10]: 5

__

Why: Adding a warning when includeMetadata=true is used with the legacy engine improves user experience by making the limitation explicit. However, this is a minor enhancement since the comment already documents the behavior, and the legacy engine is likely deprecated or less commonly used.

Low
Suggestions up to commit 6028130
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix inconsistent AllFields usage in test

This test case still uses AllFields.of() instead of AllFieldsExcludeMeta.of(), which
is inconsistent with all other test cases in the same file that were updated. Since
include_metadata is false, AllFieldsExcludeMeta.of() should be used here as well.

ppl/src/test/java/org/opensearch/sql/ppl/parser/AstStatementBuilderTest.java [113]

-new Query(project(head(head(relation("t"), 100, 0), 5, 0), AllFields.of()), 0, PPL, false));
+new Query(project(head(head(relation("t"), 100, 0), 5, 0), AllFieldsExcludeMeta.of()), 0, PPL, false));
Suggestion importance[1-10]: 8

__

Why: The test at line 113 still uses AllFields.of() instead of AllFieldsExcludeMeta.of(), which is inconsistent with all other updated test cases in the same file. Since includeMetadata is false, this is a genuine bug in the test that would cause it to fail.

Medium
General
Remove redundant side-effectful flag mutation

When allFields is AllFields (include_metadata=true), context.setProjectVisited(true)
is called before tryToRemoveMetaFields, but tryToRemoveMetaFields checks
!context.isProjectVisited() to decide whether to remove fields. Setting
projectVisited=true here means the condition !context.isProjectVisited() is false,
so tryToRemoveMetaFields with excludeByForce=false will do nothing — which is the
intended behavior. However, this side-effect-based logic is fragile; the early
return in tryToRemoveMetaFields when context.isIncludeMetadata() && !excludeByForce
already handles this case, making the setProjectVisited(true) call redundant and
potentially causing unintended side effects for subsequent pipeline stages.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [497-507]

 if (allFields instanceof AllFieldsExcludeMeta) {
   // Do NOT remove nested fields for AllFieldsExcludeMeta
   tryToRemoveMetaFields(context, true); // Force exclude metadata fields
 } else {
   // For AllFields (include_metadata=true), include metadata fields
   tryToRemoveNestedFields(context);
-  // Mark as project visited to prevent automatic metadata field removal
-  context.setProjectVisited(true);
   // Don't force exclude metadata fields - let them remain
+  // context.isIncludeMetadata() check in tryToRemoveMetaFields handles this
   tryToRemoveMetaFields(context, false);
 }
Suggestion importance[1-10]: 4

__

Why: The context.setProjectVisited(true) call before tryToRemoveMetaFields is redundant because the early return in tryToRemoveMetaFields already handles the includeMetadata=true case, and the side effect on isProjectVisited could affect subsequent pipeline stages unexpectedly. Removing it would make the logic cleaner and less fragile.

Low
Ensure null highlightConfig for paginated execution

When executing a paginated plan, highlightConfig is always null (set to null in the
paginated constructor), so passing it here is harmless but the Paginate wrapper may
not support highlightConfig. More critically, the original code passed null for
highlightConfig in the non-paginated execute overload — verify that passing
highlightConfig (which is null for paginated plans) to the new 5-argument execute
method is handled correctly and doesn't break the paginated path.

core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java [108]

-queryService.execute(new Paginate(pageSize.get(), plan), getQueryType(), highlightConfig, includeMetadata, listener);
+queryService.execute(new Paginate(pageSize.get(), plan), getQueryType(), null, includeMetadata, listener);
Suggestion importance[1-10]: 3

__

Why: The highlightConfig field is already null for paginated plans (set explicitly in the paginated constructor), so passing null explicitly vs. passing the field makes no functional difference. This is a minor clarity improvement but not a real bug.

Low

@ishag4
Copy link
Copy Markdown
Author

ishag4 commented May 8, 2026

Hi @LantaoJin @penghuo @RyanL1997 @Swiddis Could you please review?

Copy link
Copy Markdown
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add integration tests for this enhancement and update documentation (add a new section in endpoint.md

Comment thread ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLFlattenTest.java Outdated
Signed-off-by: Isha Gupta <igupta24@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b79855b

@ishag4
Copy link
Copy Markdown
Author

ishag4 commented May 17, 2026

Hi @LantaoJin @penghuo @RyanL1997 @Swiddis Could you please re-review?

Copy link
Copy Markdown
Collaborator

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue, a few suggestions & polish

Comment thread integ-test/src/test/java/org/opensearch/sql/ppl/IncludeMetadataIT.java Outdated
Comment on lines +65 to +66
HighlightConfig highlightConfig,
boolean includeMetadata) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: It may be worth merging this with HighlightConfig and renaming it to something like QueryConfig (with fieldNames -> highlightFieldNames and toFieldMap -> highlightFieldMap)

Semantically they're similar in that they provide different ways the query might run, and this means we don't now need to be passing two parameters everywhere. It'd overall reduce the one-liner changes of "the same code but add a field" and make it easier to add more config in the future.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion.
I agree a shared QueryConfig (or similar) would likely scale better than continuing to thread individual execution flags through constructors/methods.

We can avoid doing that in this PR mainly to keep the scope focused on include_metadata behavior itself and minimize the amount of unrelated refactoring/review surface.

That said, now that we have both HighlightConfig and includeMetadata, I can definitely see the benefit of consolidating execution-time options before more flags get added.

Happy to follow up with a separate cleanup/refactor PR if that direction makes sense.

Comment thread docs/user/ppl/interfaces/endpoint.md Outdated
Comment thread docs/user/ppl/interfaces/endpoint.md Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8b3962e

Signed-off-by: Isha Gupta <igupta24@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 05e612c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add include_metadata request parameter for PPL queries

3 participants