apache spark - How to do opposite of explode in PySpark?

Question

Welcome To Ask or Share your Answers For Others

apache spark - How to do opposite of explode in PySpark?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - How to do opposite of explode in PySpark?

Let's say I have a DataFrame with a column for users and another column for words they've written:

Row(user='Bob', word='hello')
Row(user='Bob', word='world')
Row(user='Mary', word='Have')
Row(user='Mary', word='a')
Row(user='Mary', word='nice')
Row(user='Mary', word='day')

I would like to aggregate the word column into a vector:

Row(user='Bob', words=['hello','world'])
Row(user='Mary', words=['Have','a','nice','day'])

It seems I can't use any of Sparks grouping functions because they expect a subsequent aggregation step. My use case is that I want to feed these data into Word2Vec not use other Spark aggregations.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:19:16+0000

Thanks to @titipat for giving the RDD solution. I did realize shortly after my post that there is actually a DataFrame solution using collect_set (or collect_list):

from pyspark.sql import Row
from pyspark.sql.functions import collect_set
rdd = spark.sparkContext.parallelize([Row(user='Bob', word='hello'),
                                      Row(user='Bob', word='world'),
                                      Row(user='Mary', word='Have'),
                                      Row(user='Mary', word='a'),
                                      Row(user='Mary', word='nice'),
                                      Row(user='Mary', word='day')])
df = spark.createDataFrame(rdd)
group_user = df.groupBy('user').agg(collect_set('word').alias('words'))
print(group_user.collect())

>[Row(user='Mary', words=['Have', 'nice', 'day', 'a']), Row(user='Bob', words=['world', 'hello'])]

Categories

apache spark - How to do opposite of explode in PySpark?

apache spark - How to do opposite of explode in PySpark?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags