[SPARK-57555][FOLLOWUP][SQL] Add TimeType support to PostgresDialect#56904
[SPARK-57555][FOLLOWUP][SQL] Add TimeType support to PostgresDialect#56904shrirangmhalgi wants to merge 3 commits into
Conversation
|
@MaxGekk / @uros-b - could you please review? This is a followup to #56653. It adds Thank you. |
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 2 non-blocking, 1 nit.
TimeType support for PostgresDialect looks correct and value-preserving for Postgres. Two things worth a look — test coverage of the new array branch, and the write-side precision flattening — plus a small doc nit. Details inline.
Findings
Suggestions (2)
- The new
toCatalystTypearray-elementtimebranch (the core read-path change) is untested: it fires only fortime[]columns, but both new tests are scalar — thec36read goes through the genericJdbcUtilspath, nottoCatalystType. Consider atime[]read test. getJDBCTypewrites bareTIMEfor all precisions, dropping the declared precision vs the baseTIME(${precision}).TIME(min(p,6))would preserve it for p<=6 while avoiding the invalidTIME(7..9)for nanos TimeTypes — or a note if uniform bareTIMEis the intended idiom.
Nits (1)
- Worth a one-line comment on why the array path hardcodes
DEFAULT_PRECISION(driver doesn't report element typmod; 6 = Postgrestimemax) vs the scalar path'sTimeType(scale).
| case "timestamp" | "time" => Some(getTimestampType(md.build())) | ||
| case "timestamp" => Some(getTimestampType(md.build())) | ||
| case "time" => | ||
| if (conf.isTimeTypeEnabled) Some(TimeType(TimeType.DEFAULT_PRECISION)) |
There was a problem hiding this comment.
This case "time" branch is reached only for time[] array elements — toCatalystType is called solely from the Types.ARRAY arm of getCatalystType. Scalar time columns return None here and fall through to the generic JdbcUtils TIME path, so the new scalar c36 read test exercises that generic path rather than this branch. The new array mapping isn't covered by either added test; a time[] array read test (analogous to the SPARK-22291 array test) would close the gap.
Separately (nit): the hardcoded TimeType.DEFAULT_PRECISION here diverges from the scalar path's TimeType(scale) (JdbcUtils). It's defensible — Postgres doesn't reliably report element typmod for arrays, and DEFAULT_PRECISION (6) is Postgres time's max so nothing is lost — but a one-line comment saying so would keep it from reading as an accidental divergence.
| case ShortType | ByteType => Some(JdbcType("SMALLINT", Types.SMALLINT)) | ||
| case TimestampType if !conf.legacyPostgresDatetimeMappingEnabled => | ||
| Some(JdbcType("TIMESTAMP WITH TIME ZONE", Types.TIMESTAMP)) | ||
| case _: TimeType => Some(JdbcType("TIME", Types.TIME)) |
There was a problem hiding this comment.
This overrides the base JdbcUtils.getCommonJDBCType mapping TimeType -> TIME(${t.precision}) with bare TIME (= Postgres TIME(6)), so the declared precision is dropped for every TimeType: a TimeType(3) writes as TIME(6), and a write/read round-trip widens TimeType(3) -> TimeType(6) (values are preserved). Postgres accepts TIME(p) for p<=6, so TIME(${math.min(t.precision, 6)}) would preserve the declared precision for p<=6 while still avoiding the out-of-range TIME(7..9) the base mapping would emit for nanos TimeTypes. If uniform bare TIME is a deliberate idiom choice, a short comment noting that would be helpful.
|
Thank you @MaxGekk for the review. Addressed all the comments in the latest commit:
I'll also add the |
9ebb4cb to
8030092
Compare
- toCatalystType: map 'time' array elements to TimeType when timeType.enabled is true (was incorrectly mapped to TimestampType) - getJDBCType: add TimeType => TIME for PostgreSQL-idiomatic DDL - Add integration tests for TIME read and write round-trip
…guard - getJDBCType: use TIME(p) for p<=6, bare TIME for out-of-range precisions - Add legacyJdbcTimeMappingEnabled guard (now that apache#56884 is merged) - Add comment explaining DEFAULT_PRECISION for array elements (driver does not report element typmod; 6 = Postgres time max fractional seconds)
- time[] array read test (exercises toCatalystType array-element path) - Precision preservation round-trip test (TimeType(3) -> TIME(3)) - End-to-end test: raw JDBC create/insert -> Spark read -> Spark write -> read-back with mixed precisions (time, time(3)) and checkAnswer
8030092 to
386efbc
Compare
What changes were proposed in this pull request?
Add TimeType support to PostgresDialect:
toCatalystType: map time array elements to TimeType whenspark.sql.timeType.enabledis true (previously mapped toTimestampTypeviagetTimestampType)getJDBCType: addTimeType => TIMEfor PostgreSQL-idiomatic DDL on writeWhy are the changes needed?
This is a followup to #56653 which added core JDBC TIME support in
JdbcUtils. That PR handled non-array read/write correctly via the base implementation, but PostgresDialect has its owntoCatalystTypehelper for array element type resolution which still mappedtimetoTimestampTypeunconditionally. This caused PostgreSQLtime[]array columns to be read with incorrect types whenspark.sql.timeType.enabledis true. The write path also lacked a dialect-specific override, falling through toTIME(6)which works but is not idiomatic for PostgreSQL (which defaults to microsecond precision with bareTIME).Does this PR introduce any user-facing change?
Yes. When
spark.sql.timeType.enabledis true, PostgreSQL time array columns now correctly infer asArrayType(TimeType)instead ofArrayType(TimestampType). Non-array TIME columns were already handled by the base JdbcUtils.How was this patch tested?
Added two integration tests to
PostgresIntegrationSuite:Was this patch authored or co-authored using generative AI tooling?
CoAuthored using Claude Opus 4.6