Skip to content

fix(duckdb): default missing year to 1970 for BigQuery PARSE_DATE/PARSE_TIMESTAMP#7800

Open
greymoth-jp wants to merge 1 commit into
tobymao:mainfrom
greymoth-jp:fix/duckdb-parse-date-timestamp-default-year
Open

fix(duckdb): default missing year to 1970 for BigQuery PARSE_DATE/PARSE_TIMESTAMP#7800
greymoth-jp wants to merge 1 commit into
tobymao:mainfrom
greymoth-jp:fix/duckdb-parse-date-timestamp-default-year

Conversation

@greymoth-jp

Copy link
Copy Markdown

BigQuery defaults an unspecified year to 1970; DuckDB's STRPTIME defaults to 1900. #7704 taught the DuckDB generator to compensate, but only for PARSE_DATETIME. The symmetric builders PARSE_DATE (StrToDate) and PARSE_TIMESTAMP (StrToTime) were left unmirrored, so a yearless PARSE_DATE('%m-%d', '12-25') transpiles to DuckDB and returns 1900-12-25 instead of 1970-12-25 — a silent 70-year error.

The fix reuses #7704's mechanism: a default_year flag set only by the BigQuery PARSE_* builders, applied via the same '1970 ' || ... prefix in DuckDB's strtodate/strtotime. Other dialects are untouched, and an explicit year still wins. Verified both ways on a real duckdb engine (1900 before, 1970 after); full suite 1106 passed.

BigQuery initializes any field left unspecified by the format string from
1970-01-01, so a yearless PARSE_DATE/PARSE_TIMESTAMP returns a 1970 date.
PR tobymao#7704 taught the DuckDB generator to compensate for this (DuckDB STRPTIME
defaults a missing year to 1900) for PARSE_DATETIME via a `default_year` flag,
but its siblings PARSE_DATE (-> exp.StrToDate) and PARSE_TIMESTAMP
(-> exp.StrToTime) were left unmirrored, so e.g.
`PARSE_DATE('%m-%d', '12-25')` transpiled to
`CAST(STRPTIME('12-25', '%m-%d') AS DATE)` -> 1900-12-25 instead of 1970-12-25.

Set `default_year` on the StrToDate/StrToTime built by the BigQuery parser and
apply the same 1970-year prefix in DuckDB's strtodate_sql/strtotime_sql,
factored into a shared _strptime_default_year helper. The flag is only set by
the BigQuery PARSE_* builders, so other dialects' StrToDate/StrToTime are
unaffected. A year already present in the input still wins (DuckDB binds the
last %Y match).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines 452 to +453
class StrToDate(Expression, Func):
arg_types = {"this": True, "format": False, "safe": False}
arg_types = {"this": True, "format": False, "safe": False, "default_year": False}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add in the base generator a strtodate_sql method, similar to strtotime_sql. Otherwise, "default_year" will leak to other dialects.

For example if we transpile from bigquery -> mysql the default_year will leak.

Comment on lines +95 to +97
# BigQuery initializes any field left unspecified by the format string from
# 1970-01-01 00:00:00.0, so a missing year defaults to 1970 (DuckDB defaults to 1900).
# The default_year flag carries this so the relevant generators can compensate.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for comment here, let's remove it

expression: exp.Expr,
value: exp.ExpOrStr | None,
formatted_time: exp.ExpOrStr | None,
) -> tuple[exp.ExpOrStr | None, exp.ExpOrStr | None]:

@geooo109 geooo109 Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor this to the following and pass just the expression

    def _strptime_default_year(
        self, expression: exp.StrToTime | exp.StrToDate | exp.ParseDatetime
    ) -> tuple[exp.ExpOrStr, exp.ExpOrStr | None]:
        value: exp.ExpOrStr = expression.this
        formatted_time: exp.ExpOrStr | None = self.format_time(expression)

        if default_year := expression.args.get("default_year"):
            value = exp.DPipe(this=exp.Literal.string(f"{default_year.name} "), expression=value)
            formatted_time = exp.DPipe(this=exp.Literal.string("%Y "), expression=formatted_time)

        return value, formatted_time

Comment on lines +1813 to +1814
# BigQuery defaults a missing year to 1970 while DuckDB defaults to 1900, so a yearless
# PARSE_DATE / PARSE_TIMESTAMP must prepend 1970, matching PARSE_DATETIME.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here let's remove the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants