I have a dataframe in Spark in which one of the columns contains an array.Now,I have written a separate UDF which converts the array to another array with distinct values in it only. See example below:
Ex: [24,23,27,23] should get converted to [24, 23, 27]
Code:
def uniq_array(col_array):
x = np.unique(col_array)
return x
uniq_array_udf = udf(uniq_array,ArrayType(IntegerType()))
Df3 = Df2.withColumn("age_array_unique",uniq_array_udf(Df2.age_array))
In the above code, Df2.age_array
is the array on which I am applying the UDF to get a different column "age_array_unique"
which should contain only unique values in the array.
However, as soon as I run the command Df3.show()
, I get the error:
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
Can anyone please let me know why this is happening?
Thanks!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…