Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -228,11 +228,16 @@ trait SparkDateTimeUtils {

/**
* Converts days since 1970-01-01 at the given zone ID to microseconds since 1970-01-01
* 00:00:00Z.
* 00:00:00Z. When `zoneId eq ZoneOffset.UTC`, takes a direct-multiply fast path that skips the
* `LocalDate`/`ZonedDateTime`/`Instant` chain.
*/
def daysToMicros(days: Int, zoneId: ZoneId): Long = {
val instant = daysToLocalDate(days).atStartOfDay(zoneId).toInstant
instantToMicros(instant)
if (zoneId eq ZoneOffset.UTC) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, is there a version of this that takes an arbitrary offset (e.g. in seconds) vs. a nominal zone ID? The distinction hardly ever matters, except when various jurisdictions change their timezone rules. For example, here in British Columbia, we have officially stopped doing DST changes! 😅

Math.multiplyExact(days.toLong, MICROS_PER_DAY)
} else {
val instant = daysToLocalDate(days).atStartOfDay(zoneId).toInstant
instantToMicros(instant)
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,32 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {
assert(DateTimeUtils.microsToMillis(-157700927876544L) === -157700927877L)
}

test("daysToMicros: ZoneOffset.UTC fast path matches the generic zone path") {
// The UTC fast path returns `days * MICROS_PER_DAY` directly; assert it agrees with the
// `LocalDate -> ZonedDateTime -> Instant` path used for any other zone whose offset is 0
// (e.g. `Etc/GMT`). Covers zero, positive, negative, and values bounded by the largest
// `days` for which `days * MICROS_PER_DAY` does not overflow `Long`.
val maxSafeDays = (Long.MaxValue / MICROS_PER_DAY).toInt
val cases = Seq(0, 1, -1, 365, -365, 16800, -16800, 1_000_000, -1_000_000,
maxSafeDays, -maxSafeDays)
val gmt = ZoneId.of("Etc/GMT")
cases.foreach { d =>
assert(daysToMicros(d, ZoneOffset.UTC) === daysToMicros(d, gmt),
s"UTC fast path diverged from Etc/GMT path at days=$d")
assert(daysToMicros(d, ZoneOffset.UTC) === d.toLong * MICROS_PER_DAY,
s"UTC fast path != days * MICROS_PER_DAY at days=$d")
}

// Overflow: any `days` past `maxSafeDays` overflows `Long` and must throw rather than
// silently wrap.
intercept[ArithmeticException] {
daysToMicros(maxSafeDays + 1, ZoneOffset.UTC)
}
intercept[ArithmeticException] {
daysToMicros(-maxSafeDays - 1, ZoneOffset.UTC)
}
}

test("SPARK-29012: special timestamp values") {
testSpecialDatetimeValues { zoneId =>
val tolerance = TimeUnit.SECONDS.toMicros(30)
Expand Down
56 changes: 28 additions & 28 deletions sql/core/benchmarks/ParquetVectorUpdaterBenchmark-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,83 +2,83 @@
Identity Updaters
================================================================================================

OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Identity Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
BooleanUpdater 0 0 0 14526.2 0.1 1.0X
ByteUpdater (INT32 -> Byte) 0 0 0 3679.3 0.3 0.3X
ShortUpdater (INT32 -> Short) 1 1 0 2054.1 0.5 0.1X
IntegerUpdater 0 0 0 10178.0 0.1 0.7X
LongUpdater 0 0 0 5054.4 0.2 0.3X
FloatUpdater 0 0 0 10212.8 0.1 0.7X
DoubleUpdater 0 0 0 5051.2 0.2 0.3X
BinaryUpdater 15 15 0 68.4 14.6 0.0X
BooleanUpdater 0 0 0 20542.2 0.0 1.0X
ByteUpdater (INT32 -> Byte) 0 0 0 3675.0 0.3 0.2X
ShortUpdater (INT32 -> Short) 1 1 0 2053.8 0.5 0.1X
IntegerUpdater 0 0 0 10229.8 0.1 0.5X
LongUpdater 0 0 0 5101.7 0.2 0.2X
FloatUpdater 0 0 0 10214.9 0.1 0.5X
DoubleUpdater 0 0 0 5106.4 0.2 0.2X
BinaryUpdater 15 15 0 68.6 14.6 0.0X


================================================================================================
Type-converting Updaters
================================================================================================

OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Type-converting Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
IntegerToLongUpdater 1 1 0 1280.6 0.8 1.0X
IntegerToDoubleUpdater 1 1 0 1537.9 0.7 1.2X
FloatToDoubleUpdater 1 1 0 1418.8 0.7 1.1X
DateToTimestampNTZUpdater 36 36 0 29.5 33.9 0.0X
DowncastLongUpdater (INT64 -> Decimal(9,2)) 2 2 0 455.3 2.2 0.4X
IntegerToLongUpdater 1 1 0 1281.5 0.8 1.0X
IntegerToDoubleUpdater 1 1 0 1532.5 0.7 1.2X
FloatToDoubleUpdater 1 1 0 1419.3 0.7 1.1X
DateToTimestampNTZUpdater 3 3 0 402.2 2.5 0.3X
DowncastLongUpdater (INT64 -> Decimal(9,2)) 2 2 0 455.2 2.2 0.4X


================================================================================================
Rebase Updaters
================================================================================================

OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Rebase Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------
IntegerWithRebaseUpdater (DATE legacy) 0 0 0 2407.3 0.4 1.0X
LongWithRebaseUpdater (TIMESTAMP_MICROS legacy) 1 1 0 2030.8 0.5 0.8X
LongAsMicrosUpdater (TIMESTAMP_MILLIS) 2 2 0 454.4 2.2 0.2X
IntegerWithRebaseUpdater (DATE legacy) 0 0 0 2602.0 0.4 1.0X
LongWithRebaseUpdater (TIMESTAMP_MICROS legacy) 1 1 0 2077.4 0.5 0.8X
LongAsMicrosUpdater (TIMESTAMP_MILLIS) 2 2 0 454.2 2.2 0.2X


================================================================================================
Unsigned Updaters
================================================================================================

OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Unsigned Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------
UnsignedIntegerUpdater (UINT32 -> Long) 1 1 0 1093.1 0.9 1.0X
UnsignedIntegerUpdater (UINT32 -> Long) 1 1 0 1093.5 0.9 1.0X
UnsignedLongUpdater (UINT64 -> Decimal(20,0)) 18 18 0 59.1 16.9 0.1X


================================================================================================
Decimal Updaters
================================================================================================

OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Decimal Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
IntegerToDecimalUpdater 0 0 0 10195.9 0.1 1.0X
LongToDecimalUpdater 0 0 0 5049.2 0.2 0.5X
FixedLenByteArrayToDecimalUpdater 21 21 0 51.0 19.6 0.0X
IntegerToDecimalUpdater 0 0 0 10208.9 0.1 1.0X
LongToDecimalUpdater 0 0 0 5104.2 0.2 0.5X
FixedLenByteArrayToDecimalUpdater 21 21 0 51.1 19.6 0.0X


================================================================================================
FixedLenByteArray Updaters
================================================================================================

OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
FixedLenByteArray Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------------------
FixedLenByteArrayUpdater (len=16 -> Binary) 19 19 0 54.9 18.2 1.0X
FixedLenByteArrayUpdater (len=16 -> Binary) 19 19 1 55.3 18.1 1.0X
FixedLenByteArrayAsIntUpdater (len=4 -> Decimal(9,2)) 7 7 0 160.1 6.2 2.9X
FixedLenByteArrayAsLongUpdater (len=8 -> Decimal(18,4)) 9 9 0 123.0 8.1 2.2X
FixedLenByteArrayAsLongUpdater (len=8 -> Decimal(18,4)) 9 9 0 123.2 8.1 2.2X