Let's say I have a DataFrame
with a column for users and another column for words they've written:
Row(user='Bob', word='hello')
Row(user='Bob', word='world')
Row(user='Mary', word='Have')
Row(user='Mary', word='a')
Row(user='Mary', word='nice')
Row(user='Mary', word='day')
I would like to aggregate the word
column into a vector:
Row(user='Bob', words=['hello','world'])
Row(user='Mary', words=['Have','a','nice','day'])
It seems I can't use any of Sparks grouping functions because they expect a subsequent aggregation step. My use case is that I want to feed these data into Word2Vec
not use other Spark aggregations.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…