scala - How to transpose dataframe in Spark 1.5 (no pivot operator available)?

Question

Welcome To Ask or Share your Answers For Others

scala - How to transpose dataframe in Spark 1.5 (no pivot operator available)?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - How to transpose dataframe in Spark 1.5 (no pivot operator available)?

I want to transpose following table using spark scala without Pivot function

I am using Spark 1.5.1 and Pivot function does not support in 1.5.1. Please suggest suitable method to transpose following table:

Customer Day   Sales
1        Mon    12
1        Tue    10
1        Thu    15
1        Fri     2
2        Sun    10
2        Wed     5
2        Thu     4
2        Fri     3

Output table :

Customer Sun Mon Tue Wed Thu Fri
   1      0   12  10   0  15  2
   2     10    0   0   5  4   3

Following code is not working as I am using Spark 1.5.1 and pivot function is available from Spark 1.6:

    var Trans = Cust_Sales.groupBy("Customer").Pivot("Day").sum("Sales")

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:57:41+0000

Not sure how efficient that is, but you can use collect to get all the distinct days, and then add these columns, then use groupBy and sum:

// get distinct days from data (this assumes there are not too many of them):
val days: Array[String] = df.select("Day")
    .distinct()
    .collect()
    .map(_.getAs[String]("Day"))

// add column for each day with the Sale value if days match:
val withDayColumns = days.foldLeft(df) { 
    case (data, day) => data.selectExpr("*", s"IF(Day = '$day', Sales, 0) AS $day")
}

// wrap it up 
val result = withDayColumns
   .drop("Day")
   .drop("Sales")
   .groupBy("Customer")
   .sum(days: _*)

result.show()

Which prints (almost) what you wanted:

+--------+--------+--------+--------+--------+--------+--------+
|Customer|sum(Tue)|sum(Thu)|sum(Sun)|sum(Fri)|sum(Mon)|sum(Wed)|
+--------+--------+--------+--------+--------+--------+--------+
|       1|      10|      15|       0|       2|      12|       0|
|       2|       0|       4|      10|       3|       0|       5|
+--------+--------+--------+--------+--------+--------+--------+

I'll leave it to you to rename / reorder the columns if needed.

Categories

scala - How to transpose dataframe in Spark 1.5 (no pivot operator available)?

scala - How to transpose dataframe in Spark 1.5 (no pivot operator available)?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags