Skip to content

Fix Spark day time dimension truncation#141

Merged
nicosuave merged 4 commits into
mainfrom
spark-date-trunc-day
May 13, 2026
Merged

Fix Spark day time dimension truncation#141
nicosuave merged 4 commits into
mainfrom
spark-date-trunc-day

Conversation

@nicosuave
Copy link
Copy Markdown
Member

@nicosuave nicosuave commented May 13, 2026

Fixes #140.

Spark SQL / HiveServer2 now uses DATE_TRUNC('DAY', column) for day-grain time dimensions instead of TRUNC(column, 'DAY'), which returns NULL in Spark. Databricks uses the same Spark-family path.

Added regression coverage for Spark-family SQL generation plus a gated Spark integration test that executes a day-grain timestamp dimension against the Docker Spark Thrift server. Also fixed the Spark integration test harness to omit a password by default for the no-auth local Spark container.

Local lint, format, full pytest, and the gated Spark integration/CLI e2e suites passed.

@nicosuave nicosuave marked this pull request as ready for review May 13, 2026 02:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a3fec2c93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sidemantic/sql/generator.py Outdated
return f"DATE_TRUNC({column_expr}, {granularity.upper()})"

if self.dialect in {"spark", "databricks"}:
return f"DATE_TRUNC('{granularity.upper()}', {column_expr})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Spark date results for date-level grains

For Spark/Databricks this now emits DATE_TRUNC for every granularity, so existing month/year time dimensions that previously used Spark's TRUNC(col, 'MONTH') path will start returning timestamps instead of dates; Spark documents date_trunc as returning a timestamp and trunc as returning a date. In Spark CLI/query results this changes unchanged date-level dimensions from values like 2026-05-01 to 2026-05-01 00:00:00 just to fix the unsupported day case, so keep the old TRUNC path for the granularities it supports or cast the new expression back to DATE where date-level output is expected.

Useful? React with 👍 / 👎.

@nicosuave nicosuave merged commit 4c2e808 into main May 13, 2026
15 checks passed
@nicosuave nicosuave deleted the spark-date-trunc-day branch May 13, 2026 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spark SQL time dimensions use TRUNC(..., 'DAY'), which returns NULL

1 participant