In my Spark application, I am trying to read multiple tables from RDBMS, doing some data processing, then write multiple tables to another RDBMS as follows (in Scala):
val reading1 = sqlContext.load("jdbc", Map("url" -> myurl1, "dbtable" -> mytable1))
val reading2 = sqlContext.load("jdbc", Map("url" -> myurl1, "dbtable" -> mytable2))
val reading3 = sqlContext.load("jdbc", Map("url" -> myurl1, "dbtable" -> mytable3))
// data processing
// ..............
myDF1.write.mode("append").jdbc(myurl2, outtable1, new java.util.Properties)
myDF2.write.mode("append").jdbc(myurl2, outtable2, new java.util.Properties)
myDF3.write.mode("append").jdbc(myurl2, outtable3, new java.util.Properties)
I understand that reading from one table can be paralleled using partitions. However, the read operations of reading1, reading2, reading3 seem sequential, so do the write operations of myDF1, myDF2, myDF3.
How can I read from the multiple tables (mytable1, mytable2, mytable3) in parallel? and also write to multiple tables in parallel (I think same logic)?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…