Skip to content

Remove DatetimeOutputCastRule on the analytics-engine route (#5420)#5454

Open
mengweieric wants to merge 1 commit into
opensearch-project:mainfrom
mengweieric:experiment/option-a-no-cast-rule
Open

Remove DatetimeOutputCastRule on the analytics-engine route (#5420)#5454
mengweieric wants to merge 1 commit into
opensearch-project:mainfrom
mengweieric:experiment/option-a-no-cast-rule

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

Problem

On the analytics-engine route, every datetime root column is wrapped in CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR) by DatetimeOutputCastRule. A matching DatetimeOutputCastRewriter on the engine side then translates that cast into DataFusion's to_char extension. Whenever the rewriter's format string and the PPL formatter disagree (e.g. trailing Z, T separator), users see wire-format divergence — issue #5420.

Solution

Drop the cast rule and let the engine return real datetime cells. The PPL response pipeline already handles datetime → string conversion natively at the formatter layer:

  • AnalyticsExecutionEngine.convertRows feeds engine cells (LocalDateTime / LocalDate / LocalTime) through ExprValueUtils.fromObjectValue, which produces ExprTimestampValue / ExprDateValue / ExprTimeValue.
  • Their value() methods render the documented PPL space-separated format (yyyy-MM-dd HH:mm:ss[.SSSSSSSSS] etc.).

The companion change in opensearch-project/OpenSearch removes the matching DatetimeOutputCastRewriter and its callsites.

Changes

  • Delete DatetimeOutputCastRule.
  • Drop the unused UdtMapping.isDatetimeType helper.
  • Rewrite the four cast-shape assertions in DatetimeExtensionTest to assert the post-removal RelNode (no CAST(... VARCHAR) wrapper, schema reports DATE / TIME / TIMESTAMP).

Test plan

  • :api:test --tests DatetimeExtensionTest — 8/8 pass
  • :api:spotlessJavaCheck
  • CI gates
  • :integ-test:integTestRemote against force-routed AE cluster

Fixes #5420

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit cf16199)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Missing Assertions

The test testDatetimeFieldsPreserveStandardTypes verifies the schema but no longer checks the actual data returned. The old test verified both schema (VARCHAR types) and data format (e.g., '2024-01-16', '12:00:00'). After removing the cast rule, the engine returns native datetime objects instead of strings. Without data assertions, the test cannot confirm that the formatter layer correctly converts these objects to the documented PPL format ('yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'). This leaves a gap in validating the end-to-end behavior described in the PR description.

public void testDatetimeFieldsPreserveStandardTypes() throws Exception {
  RelNode plan =
      planner.plan("source = catalog.events | fields hire_date, start_time, created_at");
  try (PreparedStatement statement = compiler.compile(plan)) {
    ResultSet resultSet = statement.executeQuery();
    verify(resultSet)
        .expectSchema(
            col("hire_date", java.sql.Types.DATE),
            col("start_time", java.sql.Types.TIME),
            col("created_at", java.sql.Types.TIMESTAMP));
  }
}

…ch-project#5420)

## Problem
On the analytics-engine route, every datetime root column is wrapped in
`CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR)` by `DatetimeOutputCastRule`.
A matching `DatetimeOutputCastRewriter` on the engine side then translates
that cast into DataFusion's `to_char` extension. Whenever the rewriter's
format string and the PPL formatter disagree (e.g. trailing `Z`, `T`
separator), users see wire-format divergence — issue opensearch-project#5420.

## Solution
Drop the cast rule and let the engine return real datetime cells.
The PPL response pipeline already handles datetime → string conversion
natively at the formatter layer:
- `AnalyticsExecutionEngine.convertRows` feeds engine cells
  (`LocalDateTime` / `LocalDate` / `LocalTime`) through
  `ExprValueUtils.fromObjectValue`, which produces
  `ExprTimestampValue` / `ExprDateValue` / `ExprTimeValue`.
- Their `value()` methods render the documented PPL space-separated
  format (`yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]` etc.).

The companion change in `opensearch-project/OpenSearch` removes the
matching `DatetimeOutputCastRewriter` and its callsites.

## Changes
- Delete `DatetimeOutputCastRule`.
- Drop the unused `UdtMapping.isDatetimeType` helper.
- Rewrite the four cast-shape assertions in `DatetimeExtensionTest`
  to assert the post-removal RelNode (no `CAST(... VARCHAR)` wrapper,
  schema reports `DATE` / `TIME` / `TIMESTAMP`).

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the experiment/option-a-no-cast-rule branch from bfc54ca to cf16199 Compare May 19, 2026 23:09
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cf16199

Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Please check the CI failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datetime output cast format diverges between Calcite and DataFusion engines

2 participants