from_utc_timestamp
Converts a timestamp that is timezone-agnostic (interpreted as a UTC timestamp) to a timestamp in the given time zone. This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE.
However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not timezone-agnostic. So in Spark this function just shifts the timestamp value from UTC timezone to the given timezone.
This function may return an unexpected result if the input is a string with timezone, e.g. 2018-03-13T06:18:23+00:00, because Spark first casts the string to timestamp according to the timezone in the string, and then displays the result by converting the timestamp to a string according to the session local timezone.
For the corresponding Databricks SQL function, see from_utc_timestamp function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.from_utc_timestamp(timestamp=<timestamp>, tz=<tz>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| the column that contains timestamps |
|
| A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous. |
Returns
pyspark.sql.Column: timestamp value represented in given timezone.
Examples
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('1997-02-28 10:30:00', 'JST')], ['ts', 'tz'])
df.select('*', dbf.from_utc_timestamp('ts', 'PST')).show()
df.select('*', dbf.from_utc_timestamp(df.ts, df.tz)).show()