Skip to content

[SPARK-57555][FOLLOWUP][SQL] Add TimeType support to PostgresDialect#56904

Open
shrirangmhalgi wants to merge 3 commits into
apache:masterfrom
shrirangmhalgi:SPARK-57555-postgres-time-dialect
Open

[SPARK-57555][FOLLOWUP][SQL] Add TimeType support to PostgresDialect#56904
shrirangmhalgi wants to merge 3 commits into
apache:masterfrom
shrirangmhalgi:SPARK-57555-postgres-time-dialect

Conversation

@shrirangmhalgi

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add TimeType support to PostgresDialect:

  • toCatalystType: map time array elements to TimeType when spark.sql.timeType.enabled is true (previously mapped to TimestampType via getTimestampType)
  • getJDBCType: add TimeType => TIME for PostgreSQL-idiomatic DDL on write

Why are the changes needed?

This is a followup to #56653 which added core JDBC TIME support in JdbcUtils. That PR handled non-array read/write correctly via the base implementation, but PostgresDialect has its own toCatalystType helper for array element type resolution which still mapped time to TimestampType unconditionally. This caused PostgreSQL time[] array columns to be read with incorrect types when spark.sql.timeType.enabled is true. The write path also lacked a dialect-specific override, falling through to TIME(6) which works but is not idiomatic for PostgreSQL (which defaults to microsecond precision with bare TIME).

Does this PR introduce any user-facing change?

Yes. When spark.sql.timeType.enabled is true, PostgreSQL time array columns now correctly infer as ArrayType(TimeType) instead of ArrayType(TimestampType). Non-array TIME columns were already handled by the base JdbcUtils.

How was this patch tested?

Added two integration tests to PostgresIntegrationSuite:

  • Read test: verifies existing bar table TIME column (c36) is read as TimeType with correct value
  • Round-trip test: writes three LocalTime values (midnight, midday with sub-second, end-of-day) and reads them back, asserting type and value preservation

Was this patch authored or co-authored using generative AI tooling?

CoAuthored using Claude Opus 4.6

@shrirangmhalgi

Copy link
Copy Markdown
Contributor Author

@MaxGekk / @uros-b - could you please review? This is a followup to #56653. It adds TimeType support to PostgresDialect: the dialect's array-element type helper (toCatalystType) was still mapping time to TimestampType; this fixes it to return TimeType when timeType.enabled is true, and adds PostgreSQL-idiomatic bare TIME DDL on write.

Thank you.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 2 non-blocking, 1 nit.
TimeType support for PostgresDialect looks correct and value-preserving for Postgres. Two things worth a look — test coverage of the new array branch, and the write-side precision flattening — plus a small doc nit. Details inline.

Findings

Suggestions (2)

  • The new toCatalystType array-element time branch (the core read-path change) is untested: it fires only for time[] columns, but both new tests are scalar — the c36 read goes through the generic JdbcUtils path, not toCatalystType. Consider a time[] read test.
  • getJDBCType writes bare TIME for all precisions, dropping the declared precision vs the base TIME(${precision}). TIME(min(p,6)) would preserve it for p<=6 while avoiding the invalid TIME(7..9) for nanos TimeTypes — or a note if uniform bare TIME is the intended idiom.

Nits (1)

  • Worth a one-line comment on why the array path hardcodes DEFAULT_PRECISION (driver doesn't report element typmod; 6 = Postgres time max) vs the scalar path's TimeType(scale).

case "timestamp" | "time" => Some(getTimestampType(md.build()))
case "timestamp" => Some(getTimestampType(md.build()))
case "time" =>
if (conf.isTimeTypeEnabled) Some(TimeType(TimeType.DEFAULT_PRECISION))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case "time" branch is reached only for time[] array elements — toCatalystType is called solely from the Types.ARRAY arm of getCatalystType. Scalar time columns return None here and fall through to the generic JdbcUtils TIME path, so the new scalar c36 read test exercises that generic path rather than this branch. The new array mapping isn't covered by either added test; a time[] array read test (analogous to the SPARK-22291 array test) would close the gap.

Separately (nit): the hardcoded TimeType.DEFAULT_PRECISION here diverges from the scalar path's TimeType(scale) (JdbcUtils). It's defensible — Postgres doesn't reliably report element typmod for arrays, and DEFAULT_PRECISION (6) is Postgres time's max so nothing is lost — but a one-line comment saying so would keep it from reading as an accidental divergence.

case ShortType | ByteType => Some(JdbcType("SMALLINT", Types.SMALLINT))
case TimestampType if !conf.legacyPostgresDatetimeMappingEnabled =>
Some(JdbcType("TIMESTAMP WITH TIME ZONE", Types.TIMESTAMP))
case _: TimeType => Some(JdbcType("TIME", Types.TIME))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overrides the base JdbcUtils.getCommonJDBCType mapping TimeType -> TIME(${t.precision}) with bare TIME (= Postgres TIME(6)), so the declared precision is dropped for every TimeType: a TimeType(3) writes as TIME(6), and a write/read round-trip widens TimeType(3) -> TimeType(6) (values are preserved). Postgres accepts TIME(p) for p<=6, so TIME(${math.min(t.precision, 6)}) would preserve the declared precision for p<=6 while still avoiding the out-of-range TIME(7..9) the base mapping would emit for nanos TimeTypes. If uniform bare TIME is a deliberate idiom choice, a short comment noting that would be helpful.

@shrirangmhalgi

Copy link
Copy Markdown
Contributor Author

Thank you @MaxGekk for the review. Addressed all the comments in the latest commit:

  • Added a time[] array read test that exercises the toCatalystType array-element path directly
  • getJDBCType now emits TIME(p) for p∈[0,6] and bare TIME for out-of-range precisions, preserving the declared precision on round-trip
  • Added a precision preservation round-trip test (TimeType(3) → TIME(3) → read back as TimeType(3))
  • Added comment explaining why DEFAULT_PRECISION is used for array elements (driver does not report element typmod; 6 = Postgres max)

I'll also add the legacyJdbcTimeMappingEnabled guard once #56884 merges - the per-dialect toCatalystType runs before the default mapper so it needs its own check. Left a TODO in the code for now.

@shrirangmhalgi shrirangmhalgi force-pushed the SPARK-57555-postgres-time-dialect branch from 9ebb4cb to 8030092 Compare July 1, 2026 06:41
- toCatalystType: map 'time' array elements to TimeType when
  timeType.enabled is true (was incorrectly mapped to TimestampType)
- getJDBCType: add TimeType => TIME for PostgreSQL-idiomatic DDL
- Add integration tests for TIME read and write round-trip
…guard

- getJDBCType: use TIME(p) for p<=6, bare TIME for out-of-range precisions
- Add legacyJdbcTimeMappingEnabled guard (now that apache#56884 is merged)
- Add comment explaining DEFAULT_PRECISION for array elements (driver does
  not report element typmod; 6 = Postgres time max fractional seconds)
- time[] array read test (exercises toCatalystType array-element path)
- Precision preservation round-trip test (TimeType(3) -> TIME(3))
- End-to-end test: raw JDBC create/insert -> Spark read -> Spark write
  -> read-back with mixed precisions (time, time(3)) and checkAnswer
@shrirangmhalgi shrirangmhalgi force-pushed the SPARK-57555-postgres-time-dialect branch from 8030092 to 386efbc Compare July 1, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants