I'm Trying to perform a K nearest neighbor search using spark.
I have a RDD[Seq[Double]] and I'm planing to return a
RDD[(Seq[Double],Seq[Seq[Double]])]
with the actual row and a list of neighbors
val out = data.map(row => {
val neighbours = data.top(num = 3)(new Ordering[Seq[Double]] {
override def compare(a:Seq[Double],b:Seq[Double]) = {
euclideanDistance(a,row).compare(euclideanDistance(b,row))*(-1)
}
})
(row,neighbours.toSeq)
})
And it Gives the following error on spark Submit
15/04/29 21:15:39 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, 192.168.1.7): org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
I understand that nesting RDD is not possible but how do i perform such operations where I can compare every element in the RDD with every other element in the RDD
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…