Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

scala - Create array of literals and columns from List of Strings in Spark SQL

I am trying to define functions in Scala that take a list of strings as input, and converts them into the columns passed to the dataframe array arguments used in the code below.

val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val df2 = df
        .withColumn("columnArray",array(df("foo").cast("String"),df("bar").cast("String")))
        .withColumn("litArray",array(lit("foo"),lit("bar")))

More specifically, I would like to create functions colFunction and litFunction (or just one function if possible) that takes a list of strings as an input parameter and can be used as follows:

val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val colString = List("foo","bar")
val df2 = df
         .withColumn("columnArray",array(colFunction(colString))
         .withColumn("litArray",array(litFunction(colString)))

I have tried mapping the colString to an Array of columns with all the transformations but this doesn't work. Any ideas on how this can be achieved? Many thanks for reading the question, and for any suggestions/solutions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Spark 2.2+:

Support for Seq, Map and Tuple (struct) literals has been added in SPARK-19254. According to tests:

import org.apache.spark.sql.functions.typedLit

typedLit(Seq("foo", "bar"))

Spark < 2.2

Just map with lit and wrap with array:

def asLitArray[T](xs: Seq[T]) = array(xs map lit: _*)

df.withColumn("an_array", asLitArray(colString)).show
// +---+---+----------+
// |foo|bar|  an_array|
// +---+---+----------+
// |  1|  1|[foo, bar]|
// |  2|  2|[foo, bar]|
// |  3|  3|[foo, bar]|
// +---+---+----------+

Regarding transformation from Seq[String] to Column of type Array this functionality is already provided by:

def array(colName: String, colNames: String*): Column 

or

def array(cols: Column*): Column

Example:

val cols = Seq("bar", "foo")

cols match { case x::xs => df.select(array(x, xs:_*)) 
// or 
df.select(array(cols map col: _*))

Of course all columns have to be of the same type.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...