Spark >= 2.2
You can use to_date
:
import org.apache.spark.sql.functions.{to_date, to_timestamp}
df.select(to_date($"ts", "dd-MMM-yyyy").alias("date"))
or to_timestamp
:
df.select(to_date($"ts", "dd-MMM-yyyy").alias("timestamp"))
with intermediate unix_timestamp
call.
Spark < 2.2
Since Spark 1.5 you can use unix_timestamp
function to parse string to long, cast it to timestamp and truncate to_date
:
import org.apache.spark.sql.functions.{unix_timestamp, to_date}
val df = Seq((1L, "01-APR-2015")).toDF("id", "ts")
df.select(to_date(unix_timestamp(
$"ts", "dd-MMM-yyyy"
).cast("timestamp")).alias("timestamp"))
Note:
Depending on a Spark version you this may require some adjustments due to SPARK-11724:
Casting from integer types to timestamp treats the source int as being in millis. Casting from timestamp to integer types creates the result in seconds.
If you use unpatched version unix_timestamp
output requires multiplication by 1000.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…