I have a csv file with datetime column: "2011-05-02T04:52:09+00:00".
I am using scala, the file is loaded into spark DataFrame and I can use jodas time to parse the date:
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = new SQLContext(sc).load("com.databricks.spark.csv", Map("path" -> "data.csv", "header" -> "true"))
val d = org.joda.time.format.DateTimeFormat.forPattern("yyyy-mm-dd'T'kk:mm:ssZ")
I would like to create new columns base on datetime field for timeserie analysis.
In DataFrame, how do I create a column base on value of another column?
I notice DataFrame has following function: df.withColumn("dt",column), is there a way to create a column base on value of existing column?
Thanks
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…