I have a large data called "edges"
org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(String, Int)]] = MappedRDD[27] at map at <console>:52
When I was working in standalone mode, I was able to collect, count and save this file. Now, on a cluster, I'm getting this error
edges.count
...
Serialized task 28:0 was 12519797 bytes which exceeds spark.akka.frameSize
(10485760 bytes). Consider using broadcast variables for large values.
Same with .saveAsTextFile("edges")
This is from the spark-shell. I have tried using the option
--driver-java-options "-Dspark.akka.frameSize=15"
But when I do that, it just hangs indefinitely. Any help would be appreciated.
** EDIT **
My standalone mode was on Spark 1.1.0 and my cluster is Spark 1.0.1.
Also, the hanging occurs when I go to count, collect or saveAs* the RDD, but defining it or doing filters on it work just fine.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…