Skip to content

fix(postgres): transpile DAY/MONTH/YEAR to EXTRACT [CLAUDE]#7798

Open
vjpovlitz wants to merge 2 commits into
tobymao:mainfrom
vjpovlitz:fix/postgres-day-month-year-extract
Open

fix(postgres): transpile DAY/MONTH/YEAR to EXTRACT [CLAUDE]#7798
vjpovlitz wants to merge 2 commits into
tobymao:mainfrom
vjpovlitz:fix/postgres-day-month-year-extract

Conversation

@vjpovlitz

Copy link
Copy Markdown

Summary

PostgreSQL has no scalar DAY / MONTH / YEAR functions — date parts are read via
EXTRACT(field FROM source). When transpiling from a dialect that does have them (e.g.
T-SQL), SQLGlot emitted the functions unchanged, producing invalid Postgres SQL.

This adds Postgres generator rules so exp.Day / exp.Month / exp.Year render as
EXTRACT(<part> FROM ...).

Reproduction

import sqlglot

# Before
sqlglot.transpile("SELECT DAY(d) FROM t", read="tsql", write="postgres")[0]
# -> 'SELECT DAY(d) FROM t'            # invalid: no DAY() function in Postgres

# After
# -> 'SELECT EXTRACT(DAY FROM d) FROM t'

MONTH and YEAR behave the same. EXTRACT is the standard Postgres construct for these
parts: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

Approach

  • The fix is generator-only: T-SQL already parses these into the dedicated exp.Day /
    exp.Month / exp.Year nodes, so only the Postgres output was wrong. Parsing and other
    dialects are untouched (tsql -> tsql still yields DAY(d)).
  • Rendering reuses the existing exp.Extract node rather than hand-built SQL, and the three
    rules share a small _extract_sql(part) factory in the style of the neighboring
    _date_add_sql(kind) helper.
  • Dialects that inherit the Postgres generator (Redshift, Materialize, RisingWave) also benefit;
    they likewise lack scalar DAY/MONTH/YEAR.

Test

tests/dialects/test_postgres.py::TestPostgres::test_extract_date_parts asserts the
T-SQL -> Postgres transpilation and that the EXTRACT form round-trips in Postgres.

Fixes #6220

@geooo109 geooo109 self-assigned this Jun 26, 2026
@geooo109

geooo109 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

@vjpovlitz thanks for the PR, in order to transpile DAY, MONTH, YEAR we need furhter investigation/testing.

For example the input (tsql) can be a string value, which we can't transpile in postgres as is (we need on the parsing side of tsql to wrap the function argument with a TsOrDs_ expression). Moreover, the input can be be an integer value, which again we can't transpile in postgres as is. Check also the docs here

Let's first make an analysis on all of these cases/inputs ^ in order to know the exact semantics of these functions, then we can come up with a robust implementation that covers theses cases (or a part of them).

@vjpovlitz

vjpovlitz commented Jun 26, 2026

Copy link
Copy Markdown
Author

Thank you very much for the kind and detailed feedback @geooo109. This is one of my first open source contributions, so I really appreciate the guidance. Please keep it coming, it genuinely helps me grow as a programmer.

I made assumptions, and thank you for pointing out that the naive form breaks as soon as the argument isn't already a date. I dug into the semantics and here's what I found.

T-SQL DAY/MONTH/YEAR accepts three kinds of input (per the
docs, confirmed by reproducing each case in SQLGlot):

  • date or datetime column: returns day-of-month. The naive EXTRACT(DAY FROM x) works only if x is genuinely date-typed.

  • string literal (e.g. DAY('2015-04-30 01:01:01.123') returns 30): T-SQL implicitly casts it to datetime. The naive Postgres form is not safe, it depends on the literal's inferred type.

  • integer (e.g. DAY(0) returns 1): T-SQL reads 0 as 1900-01-01, days since the base date. The naive Postgres form is invalid.

Proposed approach: wrap the argument on the T-SQL parse side in exp.TsOrDsToDate, the same way
_build_eomonth already does. That pushes an explicit CAST(x AS DATE) into every target dialect:

  • DAY(d) becomes Postgres EXTRACT(DAY FROM CAST(d AS DATE)), round-tripping in T-SQL as DAY(CAST(d AS DATE)), the same shape EOMONTH produces today.

  • string literals become CAST('...' AS DATE), which is valid in Postgres.

The existing Day/Month/Year to EXTRACT transform composes with this unchanged.

One scope question before I push: the integer-as-days-since-1900 case (DAY(0) returns 1) has no clean Postgres equivalent, since CAST(0 AS DATE) is invalid there and the faithful form would be EXTRACT(DAY FROM DATE '1900-01-01' + n).

Would you prefer this PR to:

(a) cover date, datetime, and string inputs and treat the integer case as unsupported (smaller and
focused)

or

(b) also emit the 1900-01-01 + n arithmetic for integers?

I'd like to do (b) if you're open to it, since handling all three input types keeps the transpilation faithful to T-SQL and feels more useful for the project, and I'll make sure it's well tested. If you'd rather keep this PR small, I'm glad to land (a) first and do (b) as a follow-up.

@geooo109

geooo109 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

@vjpovlitz great work!! thanks for the analysis.

Let's focus on (a) for this PR, and we can do a following PR for the integer values.

Keep in mind that the input can be a column, so the value of the argument isn't known (non-literal) on parse time.

For example:

t-sql input:
WITH t AS (SELECT 1 AS col) SELECT YEAR(col) FROM t;

So, we should use the is_type function on the generation side (postgres), and throw an unsuspported message.
Check this for example:

if datetime_expr.is_type(exp.DType.TIMESTAMPTZ, exp.DType.TIMESTAMPLTZ):

You can assume that we always run in order the parsing-type annotation-generation, check for example:

self.assertEqual(
(the method that annotates is called annotate_types and it's part of sqlglot's optimizer).

Before doing this ^, can you verify that it works for the tsql formats ?

For example the docs state the following (for the DAY) here:
If date contains only a time part, DAY will return 1 - the base day.
Which means:

1> SELECT DAY('12:12:12'); 
2> go
           
-----------
          1

(1 rows affected)

This ^ will break the transpilation.

@vjpovlitz

Copy link
Copy Markdown
Author

@geooo109 Thanks, this was really helpful guidance. I implemented (a) and verified it against a local Postgres 15 on a MacBook and a Linux VM I have.

Changes:

  • The Postgres generator now checks is_type on the argument. Integer arguments call self.unsupported(...), so YEAR(int_col) and DAY(0) raise under ErrorLevel.RAISE. We can lift this in the integer follow-up PR.
  • Non-DATE arguments are wrapped in CAST(x AS DATE). I confirmed the explicit cast is required: a bare EXTRACT(DAY FROM '2024-01-15') errors in Postgres with extract(unknown, unknown) is not unique.
  • An argument already typed as DATE is not cast twice.

One note on the pipeline: annotate_types alone left the CTE column as UNKNOWN in your WITH t AS (SELECT 1 AS col) SELECT YEAR(col) example. It only resruns first, so the test for the integer case uses an annotated literal, and the column case works throughqualify+annotate_types`.

On the formats, I ran each through real Postgres:

  • date, datetime, smalldatetime, and date-lrrectly via CAST(x AS DATE).
  • pure TIME values are the one exception. T-SQL returns 1 (the base day), but Postgres CAST('12:12:12' AS DATE) errors. A time-only string literal annotated at generation time. I treated this as a documented limitation, since extracting day/month/year from a pure time value always yields the base date anyway. Happy to also reject typed TIME via is_type if you'd prefer.

Committing and pushing to the PR branch and going through the GitHub actions checks right now, pausing for your review. Ty!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No conversion on Date functions when going from SQL Server to PostgreSQL.

2 participants