Skip to content

Suggest fields for 'field not found' errors#5402

Open
Swiddis wants to merge 5 commits into
opensearch-project:mainfrom
Swiddis:feat/field-errors-ppl
Open

Suggest fields for 'field not found' errors#5402
Swiddis wants to merge 5 commits into
opensearch-project:mainfrom
Swiddis:feat/field-errors-ppl

Conversation

@Swiddis
Copy link
Copy Markdown
Collaborator

@Swiddis Swiddis commented May 1, 2026

Description

// echo 'source=big5 | fields hostname' | ppl
{
  "error": {
    "reason": "Field [hostname] not found.",
    "code": "FIELD_NOT_FOUND",
    "suggestion": "Did you mean: host.name",
    "context": {
      "stage_description": "Parsing and validating the query",
      "stage": "analyzing",
      "requested_field": "hostname",
      "available_fields": [
        "agent",
        "agent.ephemeral_id",
        ...
      ],
      "query_pos": {
        "column": 21,
        "line": 1
      }
    },
    "details": "Field [hostname] not found.",
    "location": [
      "while preparing and validating the query plan"
    ],
    "type": "IllegalArgumentException"
  },
  "status": 400
}

Related Issues

N/A

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Swiddis added 3 commits May 1, 2026 19:54
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis changed the title tweak: Available fields reporting in field not found errors Available fields reporting in field not found errors May 1, 2026
@Swiddis Swiddis changed the title Available fields reporting in field not found errors Suggest fields for 'field not found' errors May 1, 2026
@Swiddis Swiddis added enhancement New feature or request error-experience Issues related to how we handle failure cases in the plugin. labels May 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

PR Reviewer Guide 🔍

(Review updated until commit ff6328e)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

When either s1 or s2 is null, the method returns Integer.MAX_VALUE. This could cause unexpected behavior in findClosestMatch where bestDistance is initialized to Integer.MAX_VALUE. If all candidates are null, bestDistance remains Integer.MAX_VALUE, and the condition bestDistance <= Math.max(4, target.length() / 2) will fail, returning empty. However, if a null candidate is encountered after a valid one, it won't affect the result. The real issue is that null candidates should likely be skipped rather than assigned maximum distance, as they represent invalid data rather than poor matches.

public static int levenshteinDistance(String s1, String s2) {
  if (s1 == null || s2 == null) {
    return Integer.MAX_VALUE;
  }
Inconsistent State

The @EqualsAndHashCode annotation excludes line and column fields, but these fields are not marked as final and have no validation. If two QualifiedName instances have the same parts but different line/column values, they will be considered equal and have the same hashCode. This could cause issues in hash-based collections where position information matters for error reporting. Consider whether position should be part of equality or if these fields should be in a separate wrapper class.

@EqualsAndHashCode(
    callSuper = false,
    exclude = {"line", "column"})
public class QualifiedName extends UnresolvedExpression {
  public static final String DELIMITER = ".";
  private final List<String> parts;

  private final Integer line;

  private final Integer column;
Possible Issue

The code collects parts into a list but doesn't use the collected list before the position check. The original code created the QualifiedName directly from the stream. Now it collects to a list, checks if ctx is empty (not if parts is empty), and creates a new QualifiedName with position. If the stream operations somehow produce an empty list from a non-empty ctx, the fallback new QualifiedName(parts) would throw an exception per the constructor's validation. The check should verify parts.isEmpty() instead of ctx.isEmpty() to match the actual data being used.

List<String> parts =
    ctx.stream()
        .map(RuleContext::getText)
        .map(StringUtils::unquoteIdentifier)
        .collect(Collectors.toList());

// Capture source position from the first identifier for error reporting
if (!ctx.isEmpty()) {
  ParserRuleContext first = ctx.get(0);
  int line = first.getStart().getLine();
  int column = first.getStart().getCharPositionInLine();
  return new QualifiedName(parts, line, column);
}

return new QualifiedName(parts);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

PR Code Suggestions ✨

Latest suggestions up to ff6328e

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for relation

Calling peek() without checking if the builder has any relations could throw an
exception if the relation stack is empty. This would mask the original "field not
found" error with a more confusing stack trace. Add a null/empty check before
accessing field names.

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334]

-List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+List<String> availableFields = 
+    context.relBuilder.peek() != null 
+        ? context.relBuilder.peek().getRowType().getFieldNames() 
+        : List.of();
Suggestion importance[1-10]: 8

__

Why: This is a valid concern about potential NullPointerException or empty stack exceptions when calling peek(). Adding a null/empty check prevents masking the original "field not found" error with a confusing stack trace, improving error handling robustness.

Medium
General
Improve distance threshold calculation

The threshold calculation may produce unexpected results for very short strings. For
a 2-character string, target.length() / 2 equals 1, which is less than 4, so the
threshold becomes 4. This means a 2-character typo could match a completely
different 2-character field. Consider using a more nuanced threshold that accounts
for string length more appropriately.

common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174-176]

-if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) {
+// Use a threshold that scales better with string length
+// For short strings (<=3 chars), allow distance of 1
+// For longer strings, allow up to 40% of the length, capped at 5
+int threshold = target.length() <= 3 ? 1 : Math.min(5, (target.length() * 2) / 5);
+if (bestMatch != null && bestDistance <= threshold) {
   return Optional.of(bestMatch);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential issue with the threshold calculation for very short strings. The proposed threshold logic (target.length() <= 3 ? 1 : Math.min(5, (target.length() * 2) / 5)) provides better scaling for different string lengths, preventing inappropriate matches for short strings while maintaining reasonable tolerance for longer ones.

Medium
Use field-level exclude annotations

Excluding line and column from equals() and hashCode() is correct for semantic
equality, but these fields should be marked as @EqualsAndHashCode.Exclude
individually for clarity. Additionally, consider making these fields transient if
they shouldn't be serialized, as they're metadata for error reporting only.

core/src/main/java/org/opensearch/sql/ast/expression/QualifiedName.java [23-32]

-@EqualsAndHashCode(
-    callSuper = false,
-    exclude = {"line", "column"})
+@EqualsAndHashCode(callSuper = false)
 public class QualifiedName extends UnresolvedExpression {
   ...
+  @EqualsAndHashCode.Exclude
   private final Integer line;
+  
+  @EqualsAndHashCode.Exclude
   private final Integer column;
Suggestion importance[1-10]: 4

__

Why: While using field-level @EqualsAndHashCode.Exclude annotations is more explicit and arguably clearer, the current class-level exclude parameter achieves the same functionality. This is a minor style improvement rather than a functional fix, and the suggestion about transient is speculative without knowing serialization requirements.

Low

Previous suggestions

Suggestions up to commit 1950fac
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against empty builder stack exception

Calling context.relBuilder.peek() can throw an EmptyStackException if the builder's
stack is empty (e.g., in early resolution stages). This would produce a confusing
error that masks the original "field not found" issue. Add a null/empty check or
wrap in a try-catch before accessing the row type.

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334]

-List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+List<String> availableFields;
+try {
+  availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+} catch (Exception e) {
+  availableFields = List.of();
+}
Suggestion importance[1-10]: 6

__

Why: Calling context.relBuilder.peek() on an empty stack would throw an EmptyStackException, masking the original field-not-found error. The try-catch guard is a reasonable defensive measure, though the likelihood depends on how this method is called.

Low
General
Tighten fuzzy match threshold for short strings

The threshold Math.max(4, target.length() / 2) is very permissive for short strings.
For example, a 2-character target like "id" would accept matches with distance up to
4, meaning almost any 2–6 character field would be suggested. Consider using a
stricter threshold, such as Math.max(2, target.length() / 3), to avoid misleading
suggestions.

common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174]

-if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) {
+if (bestMatch != null && bestDistance <= Math.max(2, target.length() / 3)) {
Suggestion importance[1-10]: 5

__

Why: The threshold Math.max(4, target.length() / 2) is indeed very permissive for short strings like "id", potentially producing misleading suggestions. However, the optimal threshold is subjective and the suggested Math.max(2, target.length() / 3) may be too strict for longer strings.

Low
Remove unused variable in test

The variable stack is assigned but never used in the test, which is dead code and
may indicate the intent was to verify the suggestion in the stack trace rather than
the exception message. Either use stack in the assertions or remove the unused
variable.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java [107-111]

-String stack = org.apache.commons.lang3.exception.ExceptionUtils.getStackTrace(e);
 verifyErrorMessageContains(e, "Field [nam] not found.");
 // Verify suggestion based on Levenshtein distance
 verifyErrorMessageContains(e, "Did you mean: name");
 // Verify available fields are listed
 verifyErrorMessageContains(e, "available_fields");
Suggestion importance[1-10]: 4

__

Why: The variable stack is assigned but never used, which is dead code. Removing it improves code clarity, though it's a minor style issue with no functional impact.

Low
Suggestions up to commit 8d6f36d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against empty relBuilder stack

context.relBuilder.peek() can throw an EmptyStackException if the builder's stack is
empty, which could happen in certain edge cases. You should guard against this to
avoid masking the original "field not found" error with an unrelated exception.

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334]

-List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+List<String> availableFields;
+try {
+  availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+} catch (Exception e) {
+  availableFields = List.of();
+}
Suggestion importance[1-10]: 5

__

Why: If context.relBuilder stack is empty, peek() would throw an unrelated exception masking the original "field not found" error. The guard is a reasonable defensive measure, though this edge case may be unlikely in practice.

Low
General
Fix overly permissive suggestion threshold for short strings

When target is a very short string (e.g., length 1 or 2), target.length() / 2
evaluates to 0 or 1 due to integer division, but Math.max(4, ...) will always return
4, which may be too permissive for short strings. Consider using a relative
threshold (e.g., a fraction of the target length) combined with a small absolute cap
to avoid suggesting unrelated fields for short inputs.

common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174]

-if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) {
+int threshold = Math.min(4, Math.max(1, target.length() / 2));
+if (bestMatch != null && bestDistance <= threshold) {
Suggestion importance[1-10]: 5

__

Why: The current threshold Math.max(4, target.length() / 2) is always at least 4, which is too permissive for short strings (e.g., a 2-character target could match almost anything). The suggested Math.min(4, Math.max(1, target.length() / 2)) is more conservative and avoids false suggestions for short inputs.

Low
Remove unused variable in test

The variable stack is assigned but never used in the test, which is dead code.
Either use it in an assertion or remove it to keep the test clean and avoid
confusion.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java [107]

-String stack = org.apache.commons.lang3.exception.ExceptionUtils.getStackTrace(e);
 verifyErrorMessageContains(e, "Field [nam] not found.");
 // Verify suggestion based on Levenshtein distance
 verifyErrorMessageContains(e, "Did you mean: name");
 // Verify available fields are listed
 verifyErrorMessageContains(e, "available_fields");
Suggestion importance[1-10]: 3

__

Why: The variable stack is assigned but never used, making it dead code. Removing it improves test clarity, though it has no functional impact.

Low

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Persistent review updated to latest commit 1950fac

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ff6328e.

PathLineSeverityDescription
core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java344mediumError responses now include the full list of available field names from the query row type via 'available_fields'. If unauthenticated or low-privilege users can trigger field-not-found errors (e.g., through crafted PPL queries), this enables schema enumeration without direct read access to index mappings. Not malicious in intent — it's a legitimate 'did you mean?' UX improvement — but the information disclosure boundary should be reviewed against the project's authorization model.
docs/user/ppl/cmd/mvcombine.md112lowA bare '=======' merge-conflict marker was left in the documentation file. This is a code-quality issue (accidental merge artifact) with no security impact, but indicates the change was not carefully reviewed before submission.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

* @param s2 second string
* @return the Levenshtein distance between s1 and s2
*/
public static int levenshteinDistance(String s1, String s2) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rare instance of using DSA in the real world

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ff6328e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request error-experience Issues related to how we handle failure cases in the plugin.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant