It's because, you've overwritten the max
definition provided by apache-spark
, it was easy to spot because max
was expecting an iterable
.
To fix this, you can use a different syntax, and it should work.
inesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg({"cycle": "max"})
or alternatively
from pyspark.sql.functions import max as sparkMax
linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(sparkMax(col("cycle")))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…