When you write Spark jobs that uses either mapPartition or foreachPartition you can just modify the partition data itself or just iterate through partition data respectively. The anonymous function passed as parameter will be executed on the executors thus there is not a viable way to execute a code which invokes all the nodes e.g: df.reduceByKey from one particular executor. This code should be executed only from the driver node. Thus only from the driver code you can access dataframes, datasets and spark session.
Please find here a detailed discussion over this issue and possible solutions
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…