If you are running an interactive shell, e.g. pyspark
(CLI or via an IPython notebook), by default you are running in client
mode. You can easily verify that you cannot run pyspark
or any other interactive shell in cluster
mode:
$ pyspark --master yarn --deploy-mode cluster
Python 2.7.11 (default, Mar 22 2016, 01:42:54)
[GCC Intel(R) C++ gcc 4.8 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.
$ spark-shell --master yarn --deploy-mode cluster
Error: Cluster deploy mode is not applicable to Spark shells.
Examining the contents of the bin/pyspark
file may be instructive, too - here is the final line (which is the actual executable):
$ pwd
/home/ctsats/spark-1.6.1-bin-hadoop2.6
$ cat bin/pyspark
[...]
exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"
i.e. pyspark
is actually a script run by spark-submit
and given the name PySparkShell
, by which you can find it in the Spark History Server UI; and since it is run like that, it goes by whatever arguments (or defaults) are included with its spark-submit
command.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…