Initial implementation of report-builder interface#5266
Initial implementation of report-builder interface#5266Swiddis wants to merge 16 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
|
Failed to generate code suggestions for PR |
|
Failed to generate code suggestions for PR |
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
c1c2355 to
23c5fbb
Compare
|
Failed to generate code suggestions for PR |
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
|
Failed to generate code suggestions for PR |
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
|
Failed to generate code suggestions for PR |
NPE in the analytics path is a server bug (null schema field, missing row), not bad user input. Removed from client error list. Will sync this classification with RestPPLQueryAction updates in opensearch-project#5266. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 6f8d67a.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Failed to generate code suggestions for PR |
In context, this is pulling details out of a potentially very fuzzy Calcite error. The error logic without any of this ultimately returns the plan with a message resembling "Failed to prepare plan: [blob]." The plan is provided as context as the innermost error might be weird and only apply to one node, e.g. Any details about our internal implementation can be easily scrutinized by navigating to https://github.com/opensearch-project/sql in a recent version of Netscape Navigator. |
…5267) * [Mustang] Add query routing and execution handoff for Parquet-backed indices (#5247) Implements the query routing and AnalyticsExecutionEngine for Project Mustang's unified query pipeline. PPL queries targeting parquet_ prefixed indices are routed through UnifiedQueryPlanner and executed via a stub QueryPlanExecutor, with results formatted through the existing JDBC response pipeline. New files: - QueryPlanExecutor: @FunctionalInterface contract for analytics engine - AnalyticsExecutionEngine: converts Iterable<Object[]> to QueryResponse with type mapping and query size limit enforcement - RestUnifiedQueryAction: orchestrates schema building, planning, execution on sql-worker thread pool, with client/server error classification and metrics - StubQueryPlanExecutor: canned data for parquet_logs and parquet_metrics tables for development and testing Modified files: - RestPPLQueryAction: routing branch for parquet_ indices - SQLPlugin: passes ClusterService and NodeClient to RestPPLQueryAction - plugin/build.gradle: adds :api dependency for UnifiedQueryPlanner Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Align isClientError with RestPPLQueryAction classification Add missing exception types to isClientError(): IndexNotFoundException, ExpressionEvaluationException, QueryEngineException, DataSourceClientException, IllegalAccessException. Matches the full list in RestPPLQueryAction.isClientError(). Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Move stub code into analytics/stub sub-package Extract StubSchemaProvider, StubQueryPlanExecutor, and StubIndexDetector into plugin/.../rest/analytics/stub/ package to clearly separate temporary stub code from production code. RestUnifiedQueryAction now delegates to these stub classes. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use ErrorMessageFactory for error responses Replace hand-crafted JSON error response with ErrorMessageFactory.createErrorMessage(), matching the standard error format used in RestPPLQueryAction.reportError(). Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Make metrics and response formatting query-type-aware Use QueryType to select the correct metrics (PPL_REQ_TOTAL vs REQ_TOTAL, PPL_FAILED_REQ_COUNT_* vs FAILED_REQ_COUNT_*) and LangSpec (PPL_SPEC vs SQL_SPEC) so this class can serve both PPL and SQL queries when unified SQL support is added. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Move analytics routing from REST layer to transport layer Move the analytics index routing check from RestPPLQueryAction into TransportPPLQueryAction.doExecute(). This ensures the analytics path gets the same PPL enabled check, metrics, request ID, and inter-plugin transport support as the existing Lucene path. RestPPLQueryAction and SQLPlugin are reverted to their original state. Added executeViaTransport() to RestUnifiedQueryAction which returns results via ActionListener<TransportPPLQueryResponse> instead of RestChannel, integrating properly with the transport action pattern. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add query size limit to RelNode plan instead of post-processing Add LogicalSystemLimit to the RelNode plan before passing it to the analytics engine, consistent with PPL V3 (QueryService.convertToCalcitePlan). This ensures the analytics engine enforces the limit during execution rather than returning all rows for post-processing truncation. Remove post-processing querySizeLimit truncation from AnalyticsExecutionEngine -- the limit is now part of the plan the executor receives. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Propagate security context to sql-worker thread pool Wrap scheduled lambdas in both execute() and executeViaTransport() with withCurrentContext() to capture and restore ThreadContext (user identity, permissions, audit trail) on the worker thread. Follows the same pattern as OpenSearchQueryManager.withCurrentContext(). Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Move analytics routing after profiling setup Move the analytics index routing check after QueryContext.setProfile() and wrapWithProfilingClear(listener). Use clearingListener instead of raw listener so profiling thread-local state is properly cleaned up after analytics path execution. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove NPE from isClientError classification NPE in the analytics path is a server bug (null schema field, missing row), not bad user input. Removed from client error list. Will sync this classification with RestPPLQueryAction updates in #5266. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove duplicate REST execution path from RestUnifiedQueryAction Remove execute(query, queryType, channel), doExecute(), createQueryListener(channel), recordSuccessMetric(), recordFailureMetric(), reportError(), and related REST imports. Since routing now goes through TransportPPLQueryAction, the REST-specific path was unused. Renamed executeViaTransport() to execute() as the sole entry point. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
|
Failed to generate code suggestions for PR |
|
Failed to generate code suggestions for PR |
…oint Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
|
Failed to generate code suggestions for PR |
090ece5 to
4b73f48
Compare
|
Failed to generate code suggestions for PR |
4b73f48 to
ca77eeb
Compare
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
ca77eeb to
6f8d67a
Compare
|
Failed to generate code suggestions for PR |
1 similar comment
|
Failed to generate code suggestions for PR |
| private static boolean isNonPushdownEnumerableAggregate(String message) { | ||
| return message.contains("Error while preparing plan") | ||
| && message.contains("CalciteEnumerableNestedAggregate"); | ||
| } | ||
|
|
||
| // Detect if error is due to window functions in unsupported context (bins on time fields) | ||
| private static boolean isWindowBinOnTimeField(SQLException e) { | ||
| String errorMsg = e.getMessage(); | ||
| return errorMsg != null | ||
| && errorMsg.contains("Error while preparing plan") | ||
| && errorMsg.contains("WIDTH_BUCKET"); | ||
| } |
There was a problem hiding this comment.
self-suggestion: These conditions can be simplified by using error codes at a lower level
The checks here just become report.getCode() == ErrorCode.ILLEGAL_AGGREGATE_PUSHDOWN and report.getCode() == ErrorCode.INVALID_WINDOW_BIN or something, and we can remove the helpers. Can add this if it comes up or is in the way of other changes.
I mostly bring it up to highlight why I'm introducing codes in the first place: They can let us specify much more specific scenarios than generic exceptions do. I just didn't fully integrate it here in the interest of time & keeping the PR as small as I could get it.
|
Not sure if understand the improvements: what is the error experience before and after this PR? |
|
You can see some concrete examples by looking at the diffs for doctests that check for errors https://github.com/opensearch-project/sql/pull/5266/changes#diff-601c336135ceed5ca58ff75bf3783afd2722610e3b5583160476fc9378b34cd6 |
|
In meta issue, we mention the "The core problem is that misbehaving queries are typically hard to debug". So the improvements like |
Description
Well, this is turning out quite long.
The fundamental idea of this PR is to add a lot more context to error messages. For example, take this bad query:
Now there's more context, provided at various layers:
{ "error": { "context": { "stage": "analyzing", "stage_description": "Parsing and validating the query" }, "details": "Field [nonex] not found.", "location": [ "while preparing and validating the query plan" ], "code": "FIELD_NOT_FOUND", "type": "IllegalArgumentException" }, "status": 400 }Or, the humble 'index not found':
{ "error": { "context": { "stage": "analyzing", "index_name": "big5_nonex", "stage_description": "Parsing and validating the query" }, "details": "no such index [big5_nonex]", "location": [ "while preparing and validating the query plan", "while fetching index mappings" ], "code": "INDEX_NOT_FOUND", "type": "IndexNotFoundException" }, "status": 404 }The doctest deltas have a few concrete cases where I went through and did light touch-ups. Improving any particular error takes O(2 minutes) to wrap in a report with context, and it'll integrate with any higher-up wrapping to give the completed report at the end.
A lot of this PR in terms of line count is just patching specific places where tests are asserting errors behave in specific ways. The core of the PR is:
ErrorReportandStageErrorHandler, which adds utilities for incrementally building detailed errorsQueryServicewhere we hook up the outer shell of the stage handlers, andRestPPLQueryActionwhere we handle and present the errors.Simple tests are in
error_reports.yml, as well as doctest and various other scattered tests. (There's also lots of tests that have no diff, but did catch bugs in earlier versions of this PR.)The enhancements aren't super complete in this PR as the emphasis is scaffolding (and the integration work + fixing tests is already quite significant), for the most part the changes are backwards compatible in normal cases. The
type,details, andstatusfields are in the same spots and by default pull from the underlying exception. I'll be adding more thorough probes for specific error cases in followups.Related Issues
Meta: #5261
Closes #4919
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.