Suggest fields for 'field not found' errors by Swiddis · Pull Request #5402 · opensearch-project/sql

Swiddis · 2026-05-01T21:07:43Z

Description

// echo 'source=big5 | fields hostname' | ppl
{
  "error": {
    "reason": "Field [hostname] not found.",
    "code": "FIELD_NOT_FOUND",
    "suggestion": "Did you mean: host.name",
    "context": {
      "stage_description": "Parsing and validating the query",
      "stage": "analyzing",
      "requested_field": "hostname",
      "available_fields": [
        "agent",
        "agent.ephemeral_id",
        ...
      ],
      "query_pos": {
        "column": 21,
        "line": 1
      }
    },
    "details": "Field [hostname] not found.",
    "location": [
      "while preparing and validating the query plan"
    ],
    "type": "IllegalArgumentException"
  },
  "status": 400
}

Related Issues

N/A

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

github-actions · 2026-05-01T21:09:16Z

PR Reviewer Guide 🔍

(Review updated until commit `ff6328e`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Issue When either `s1` or `s2` is null, the method returns `Integer.MAX_VALUE`. This could cause unexpected behavior in `findClosestMatch` where `bestDistance` is initialized to `Integer.MAX_VALUE`. If all candidates are null, `bestDistance` remains `Integer.MAX_VALUE`, and the condition `bestDistance <= Math.max(4, target.length() / 2)` will fail, returning empty. However, if a null candidate is encountered after a valid one, it won't affect the result. The real issue is that null candidates should likely be skipped rather than assigned maximum distance, as they represent invalid data rather than poor matches. public static int levenshteinDistance(String s1, String s2) { if (s1 == null \|\| s2 == null) { return Integer.MAX_VALUE; } Inconsistent State The `@EqualsAndHashCode` annotation excludes `line` and `column` fields, but these fields are not marked as `final` and have no validation. If two `QualifiedName` instances have the same parts but different line/column values, they will be considered equal and have the same hashCode. This could cause issues in hash-based collections where position information matters for error reporting. Consider whether position should be part of equality or if these fields should be in a separate wrapper class. @EqualsAndHashCode( callSuper = false, exclude = {"line", "column"}) public class QualifiedName extends UnresolvedExpression { public static final String DELIMITER = "."; private final List<String> parts; private final Integer line; private final Integer column; Possible Issue The code collects parts into a list but doesn't use the collected list before the position check. The original code created the `QualifiedName` directly from the stream. Now it collects to a list, checks if `ctx` is empty (not if `parts` is empty), and creates a new `QualifiedName` with position. If the stream operations somehow produce an empty list from a non-empty `ctx`, the fallback `new QualifiedName(parts)` would throw an exception per the constructor's validation. The check should verify `parts.isEmpty()` instead of `ctx.isEmpty()` to match the actual data being used. List<String> parts = ctx.stream() .map(RuleContext::getText) .map(StringUtils::unquoteIdentifier) .collect(Collectors.toList()); // Capture source position from the first identifier for error reporting if (!ctx.isEmpty()) { ParserRuleContext first = ctx.get(0); int line = first.getStart().getLine(); int column = first.getStart().getCharPositionInLine(); return new QualifiedName(parts, line, column); } return new QualifiedName(parts);

github-actions · 2026-05-01T21:09:38Z

PR Code Suggestions ✨

Latest suggestions up to ff6328e

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Add null check for relation Calling `peek()` without checking if the builder has any relations could throw an exception if the relation stack is empty. This would mask the original "field not found" error with a more confusing stack trace. Add a null/empty check before accessing field names. core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334] -List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames(); +List<String> availableFields = + context.relBuilder.peek() != null + ? context.relBuilder.peek().getRowType().getFieldNames() + : List.of(); Suggestion importance[1-10]: 8 __ Why: This is a valid concern about potential `NullPointerException` or empty stack exceptions when calling `peek()`. Adding a null/empty check prevents masking the original "field not found" error with a confusing stack trace, improving error handling robustness.	Medium
General	Improve distance threshold calculation The threshold calculation may produce unexpected results for very short strings. For a 2-character string, `target.length() / 2` equals 1, which is less than 4, so the threshold becomes 4. This means a 2-character typo could match a completely different 2-character field. Consider using a more nuanced threshold that accounts for string length more appropriately. common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174-176] -if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) { +// Use a threshold that scales better with string length +// For short strings (<=3 chars), allow distance of 1 +// For longer strings, allow up to 40% of the length, capped at 5 +int threshold = target.length() <= 3 ? 1 : Math.min(5, (target.length() * 2) / 5); +if (bestMatch != null && bestDistance <= threshold) { return Optional.of(bestMatch); } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential issue with the threshold calculation for very short strings. The proposed threshold logic (`target.length() <= 3 ? 1 : Math.min(5, (target.length() * 2) / 5)`) provides better scaling for different string lengths, preventing inappropriate matches for short strings while maintaining reasonable tolerance for longer ones.	Medium
General	Use field-level exclude annotations Excluding `line` and `column` from `equals()` and `hashCode()` is correct for semantic equality, but these fields should be marked as `@EqualsAndHashCode.Exclude` individually for clarity. Additionally, consider making these fields `transient` if they shouldn't be serialized, as they're metadata for error reporting only. core/src/main/java/org/opensearch/sql/ast/expression/QualifiedName.java [23-32] -@EqualsAndHashCode( - callSuper = false, - exclude = {"line", "column"}) +@EqualsAndHashCode(callSuper = false) public class QualifiedName extends UnresolvedExpression { ... + @EqualsAndHashCode.Exclude private final Integer line; + + @EqualsAndHashCode.Exclude private final Integer column; Suggestion importance[1-10]: 4 __ Why: While using field-level `@EqualsAndHashCode.Exclude` annotations is more explicit and arguably clearer, the current class-level `exclude` parameter achieves the same functionality. This is a minor style improvement rather than a functional fix, and the suggestion about `transient` is speculative without knowing serialization requirements.	Low

Previous suggestions

Suggestions up to commit 1950fac

Category	Suggestion	Impact
Possible issue	Guard against empty builder stack exception Calling `context.relBuilder.peek()` can throw an `EmptyStackException` if the builder's stack is empty (e.g., in early resolution stages). This would produce a confusing error that masks the original "field not found" issue. Add a null/empty check or wrap in a try-catch before accessing the row type. core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334] -List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames(); +List<String> availableFields; +try { + availableFields = context.relBuilder.peek().getRowType().getFieldNames(); +} catch (Exception e) { + availableFields = List.of(); +} Suggestion importance[1-10]: 6 __ Why: Calling `context.relBuilder.peek()` on an empty stack would throw an `EmptyStackException`, masking the original field-not-found error. The try-catch guard is a reasonable defensive measure, though the likelihood depends on how this method is called.	Low
General	Tighten fuzzy match threshold for short strings The threshold `Math.max(4, target.length() / 2)` is very permissive for short strings. For example, a 2-character target like `"id"` would accept matches with distance up to 4, meaning almost any 2–6 character field would be suggested. Consider using a stricter threshold, such as `Math.max(2, target.length() / 3)`, to avoid misleading suggestions. common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174] -if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) { +if (bestMatch != null && bestDistance <= Math.max(2, target.length() / 3)) { Suggestion importance[1-10]: 5 __ Why: The threshold `Math.max(4, target.length() / 2)` is indeed very permissive for short strings like `"id"`, potentially producing misleading suggestions. However, the optimal threshold is subjective and the suggested `Math.max(2, target.length() / 3)` may be too strict for longer strings.	Low
General	Remove unused variable in test The variable `stack` is assigned but never used in the test, which is dead code and may indicate the intent was to verify the suggestion in the stack trace rather than the exception message. Either use `stack` in the assertions or remove the unused variable. integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java [107-111] -String stack = org.apache.commons.lang3.exception.ExceptionUtils.getStackTrace(e); verifyErrorMessageContains(e, "Field [nam] not found."); // Verify suggestion based on Levenshtein distance verifyErrorMessageContains(e, "Did you mean: name"); // Verify available fields are listed verifyErrorMessageContains(e, "available_fields"); Suggestion importance[1-10]: 4 __ Why: The variable `stack` is assigned but never used, which is dead code. Removing it improves code clarity, though it's a minor style issue with no functional impact.	Low

Suggestions up to commit 8d6f36d

Category	Suggestion	Impact
Possible issue	Guard against empty relBuilder stack `context.relBuilder.peek()` can throw an `EmptyStackException` if the builder's stack is empty, which could happen in certain edge cases. You should guard against this to avoid masking the original "field not found" error with an unrelated exception. core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334] -List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames(); +List<String> availableFields; +try { + availableFields = context.relBuilder.peek().getRowType().getFieldNames(); +} catch (Exception e) { + availableFields = List.of(); +} Suggestion importance[1-10]: 5 __ Why: If `context.relBuilder` stack is empty, `peek()` would throw an unrelated exception masking the original "field not found" error. The guard is a reasonable defensive measure, though this edge case may be unlikely in practice.	Low
General	Fix overly permissive suggestion threshold for short strings When `target` is a very short string (e.g., length 1 or 2), `target.length() / 2` evaluates to 0 or 1 due to integer division, but `Math.max(4, ...)` will always return 4, which may be too permissive for short strings. Consider using a relative threshold (e.g., a fraction of the target length) combined with a small absolute cap to avoid suggesting unrelated fields for short inputs. common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174] -if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) { +int threshold = Math.min(4, Math.max(1, target.length() / 2)); +if (bestMatch != null && bestDistance <= threshold) { Suggestion importance[1-10]: 5 __ Why: The current threshold `Math.max(4, target.length() / 2)` is always at least 4, which is too permissive for short strings (e.g., a 2-character target could match almost anything). The suggested `Math.min(4, Math.max(1, target.length() / 2))` is more conservative and avoids false suggestions for short inputs.	Low
General	Remove unused variable in test The variable `stack` is assigned but never used in the test, which is dead code. Either use it in an assertion or remove it to keep the test clean and avoid confusion. integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java [107] -String stack = org.apache.commons.lang3.exception.ExceptionUtils.getStackTrace(e); verifyErrorMessageContains(e, "Field [nam] not found."); // Verify suggestion based on Levenshtein distance verifyErrorMessageContains(e, "Did you mean: name"); // Verify available fields are listed verifyErrorMessageContains(e, "available_fields"); Suggestion importance[1-10]: 3 __ Why: The variable `stack` is assigned but never used, making it dead code. Removing it improves test clarity, though it has no functional impact.	Low

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

github-actions · 2026-05-01T21:20:42Z

Persistent review updated to latest commit 1950fac

github-actions · 2026-05-18T22:51:54Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ff6328e.

Path	Line	Severity	Description
core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java	344	medium	Error responses now include the full list of available field names from the query row type via 'available_fields'. If unauthenticated or low-privilege users can trigger field-not-found errors (e.g., through crafted PPL queries), this enables schema enumeration without direct read access to index mappings. Not malicious in intent — it's a legitimate 'did you mean?' UX improvement — but the information disclosure boundary should be reviewed against the project's authorization model.
docs/user/ppl/cmd/mvcombine.md	112	low	A bare '=======' merge-conflict marker was left in the documentation file. This is a code-quality issue (accidental merge artifact) with no security impact, but indicates the change was not carefully reviewed before submission.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Swiddis · 2026-05-18T22:52:04Z

+   * @param s2 second string
+   * @return the Levenshtein distance between s1 and s2
+   */
+  public static int levenshteinDistance(String s1, String s2) {


rare instance of using DSA in the real world

github-actions · 2026-05-18T22:52:52Z

Persistent review updated to latest commit ff6328e

Swiddis added 3 commits May 1, 2026 19:54

tweak: Available fields reporting in field not found errors

685b0fe

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

tweak: Cursor position and available fields

333549b

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

suggest nearby fields

8d6f36d

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

Swiddis requested review from LantaoJin, RyanL1997, acarbonetto, ahkcs, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners May 1, 2026 21:07

Swiddis changed the title ~~tweak: Available fields reporting in field not found errors~~ Available fields reporting in field not found errors May 1, 2026

Swiddis changed the title ~~Available fields reporting in field not found errors~~ Suggest fields for 'field not found' errors May 1, 2026

Swiddis added enhancement New feature or request error-experience Issues related to how we handle failure cases in the plugin. labels May 1, 2026

fix doctest

1950fac

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

Merge branch 'main' into feat/field-errors-ppl

ff6328e

Swiddis commented May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest fields for 'field not found' errors#5402

Suggest fields for 'field not found' errors#5402
Swiddis wants to merge 5 commits into
opensearch-project:mainfrom
Swiddis:feat/field-errors-ppl

Swiddis commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Swiddis May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Swiddis commented May 1, 2026

Description

Related Issues

Check List

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit ff6328e)

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 18, 2026

PR Code Analyzer ❗

Uh oh!

Swiddis May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 1, 2026 •

edited

Loading

(Review updated until commit `ff6328e`)

github-actions Bot commented May 1, 2026 •

edited

Loading