Skip to content

[SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC#55893

Open
LuciferYang wants to merge 3 commits into
apache:masterfrom
LuciferYang:SPARK-daystomicros-utc-fastpath
Open

[SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC#55893
LuciferYang wants to merge 3 commits into
apache:masterfrom
LuciferYang:SPARK-daystomicros-utc-fastpath

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented May 15, 2026

What changes were proposed in this pull request?

Add a ZoneOffset.UTC fast path to DateTimeUtils.daysToMicros(days, zoneId):

if (zoneId eq ZoneOffset.UTC) {
  Math.multiplyExact(days.toLong, MICROS_PER_DAY)
} else {
  // existing LocalDate -> ZonedDateTime -> Instant path
}

For UTC the answer is simply days * MICROS_PER_DAY, so the slow path's three heap allocations (LocalDate, ZonedDateTime, Instant) are wasted.

Why are the changes needed?

daysToMicros(days, ZoneOffset.UTC) is on the per-row hot path of the vectorized parquet reader (DateToTimestampNTZUpdater and the rebase variants), the row-based parquet converter (ParquetRowConverter), the Avro reader (AvroDeserializer), and DATE -> TIMESTAMP Cast (interpreted + codegen). All of them pass the ZoneOffset.UTC singleton, so the reference-equality fast path triggers everywhere it matters.

ParquetVectorUpdaterBenchmark results will be regenerated via GitHub Actions.

Does this PR introduce any user-facing change?

No, pure optimization. Behavior is preserved for every input in Spark's valid DateType range. Both paths use Math.multiplyExact internally and overflow at the same |days| ~= 107M boundary with the same ArithmeticException, far outside any reachable input.

How was this patch tested?

New DateTimeUtilsSuite contract test pins down the UTC fast path:

  • Asserts it agrees with a fixed-offset zone (Etc/GMT) path for a representative set of days (zero, positive, negative, +/-maxSafeDays).
  • Asserts it equals days * MICROS_PER_DAY directly, so divergence in some future JDK is caught.
  • Asserts ArithmeticException on overflow.

Was this patch authored or co-authored using generative AI tooling?

No

…t.UTC

`daysToMicros(days, zoneId)` allocates `LocalDate`, `ZonedDateTime`, and
`Instant` per call. For `ZoneOffset.UTC` the answer is simply
`days * MICROS_PER_DAY`, so add a reference-equality fast path that returns
`Math.multiplyExact(days.toLong, MICROS_PER_DAY)`.

Every real Spark caller passing UTC -- the vectorized parquet
`DateToTimestampNTZUpdater` and its rebase variants, the row-based
`ParquetRowConverter`, the Avro reader, and `Cast` (interpreted + codegen)
-- uses the `ZoneOffset.UTC` singleton, so the fast path triggers
everywhere it matters.
@LuciferYang LuciferYang changed the title [SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC [SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC May 15, 2026
@LuciferYang
Copy link
Copy Markdown
Contributor Author

test first, will update benchmark results later

def daysToMicros(days: Int, zoneId: ZoneId): Long = {
val instant = daysToLocalDate(days).atStartOfDay(zoneId).toInstant
instantToMicros(instant)
if (zoneId eq ZoneOffset.UTC) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, is there a version of this that takes an arbitrary offset (e.g. in seconds) vs. a nominal zone ID? The distinction hardly ever matters, except when various jurisdictions change their timezone rules. For example, here in British Columbia, we have officially stopped doing DST changes! 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants