fix(duckdb): default missing year to 1970 for BigQuery PARSE_DATE/PARSE_TIMESTAMP#7800
Open
greymoth-jp wants to merge 1 commit into
Open
Conversation
BigQuery initializes any field left unspecified by the format string from 1970-01-01, so a yearless PARSE_DATE/PARSE_TIMESTAMP returns a 1970 date. PR tobymao#7704 taught the DuckDB generator to compensate for this (DuckDB STRPTIME defaults a missing year to 1900) for PARSE_DATETIME via a `default_year` flag, but its siblings PARSE_DATE (-> exp.StrToDate) and PARSE_TIMESTAMP (-> exp.StrToTime) were left unmirrored, so e.g. `PARSE_DATE('%m-%d', '12-25')` transpiled to `CAST(STRPTIME('12-25', '%m-%d') AS DATE)` -> 1900-12-25 instead of 1970-12-25. Set `default_year` on the StrToDate/StrToTime built by the BigQuery parser and apply the same 1970-year prefix in DuckDB's strtodate_sql/strtotime_sql, factored into a shared _strptime_default_year helper. The flag is only set by the BigQuery PARSE_* builders, so other dialects' StrToDate/StrToTime are unaffected. A year already present in the input still wins (DuckDB binds the last %Y match). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
geooo109
reviewed
Jun 26, 2026
Comment on lines
452
to
+453
| class StrToDate(Expression, Func): | ||
| arg_types = {"this": True, "format": False, "safe": False} | ||
| arg_types = {"this": True, "format": False, "safe": False, "default_year": False} |
Collaborator
There was a problem hiding this comment.
We need to add in the base generator a strtodate_sql method, similar to strtotime_sql. Otherwise, "default_year" will leak to other dialects.
For example if we transpile from bigquery -> mysql the default_year will leak.
Comment on lines
+95
to
+97
| # BigQuery initializes any field left unspecified by the format string from | ||
| # 1970-01-01 00:00:00.0, so a missing year defaults to 1970 (DuckDB defaults to 1900). | ||
| # The default_year flag carries this so the relevant generators can compensate. |
Collaborator
There was a problem hiding this comment.
No need for comment here, let's remove it
| expression: exp.Expr, | ||
| value: exp.ExpOrStr | None, | ||
| formatted_time: exp.ExpOrStr | None, | ||
| ) -> tuple[exp.ExpOrStr | None, exp.ExpOrStr | None]: |
Collaborator
There was a problem hiding this comment.
Let's refactor this to the following and pass just the expression
def _strptime_default_year(
self, expression: exp.StrToTime | exp.StrToDate | exp.ParseDatetime
) -> tuple[exp.ExpOrStr, exp.ExpOrStr | None]:
value: exp.ExpOrStr = expression.this
formatted_time: exp.ExpOrStr | None = self.format_time(expression)
if default_year := expression.args.get("default_year"):
value = exp.DPipe(this=exp.Literal.string(f"{default_year.name} "), expression=value)
formatted_time = exp.DPipe(this=exp.Literal.string("%Y "), expression=formatted_time)
return value, formatted_time
Comment on lines
+1813
to
+1814
| # BigQuery defaults a missing year to 1970 while DuckDB defaults to 1900, so a yearless | ||
| # PARSE_DATE / PARSE_TIMESTAMP must prepend 1970, matching PARSE_DATETIME. |
Collaborator
There was a problem hiding this comment.
Also here let's remove the comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BigQuery defaults an unspecified year to 1970; DuckDB's STRPTIME defaults to 1900. #7704 taught the DuckDB generator to compensate, but only for PARSE_DATETIME. The symmetric builders PARSE_DATE (StrToDate) and PARSE_TIMESTAMP (StrToTime) were left unmirrored, so a yearless
PARSE_DATE('%m-%d', '12-25')transpiles to DuckDB and returns 1900-12-25 instead of 1970-12-25 — a silent 70-year error.The fix reuses #7704's mechanism: a
default_yearflag set only by the BigQuery PARSE_* builders, applied via the same'1970 ' || ...prefix in DuckDB's strtodate/strtotime. Other dialects are untouched, and an explicit year still wins. Verified both ways on a real duckdb engine (1900 before, 1970 after); full suite 1106 passed.