No, when deploy-mode is client
, the Driver Program is not necessarily the master node. You could run spark-submit on your laptop, and the Driver Program would run on your laptop.
On the contrary, when deploy-mode is cluster
, then cluster manager (master node) is used to find a slave having enough available resources to execute the Driver Program. As a result, the Driver Program would run on one of the slave nodes. As its execution is delegated, you can not get the result from Driver Program, it must store its results in a file, database, etc.
- Client mode
- Want to get a job result (dynamic analysis)
- Easier for developing/debugging
- Control where your Driver Program is running
- Always up application: expose your Spark job launcher as REST service or a Web UI
- Cluster mode
- Easier for resource allocation (let the master decide): Fire and forget
- Monitor your Driver Program from Master Web UI like other workers
- Stop at the end: one job is finished, allocated resources are freed
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…