apache spark - How do I run pyspark with jupyter notebook?

Question

Welcome To Ask or Share your Answers For Others

apache spark - How do I run pyspark with jupyter notebook?

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:52:35+0000

I'm assuming you already have spark and jupyter notebooks installed and they work flawlessly independent of each other.

If that is the case, then follow the steps below and you should be able to fire up a jupyter notebook with a (py)spark backend.

Go to your spark installation folder and there should be a bin directory there: /path/to/spark/bin
Create a file, let's call it start_pyspark.sh

Open start_pyspark.sh and write something like:

    #!/bin/bash

export PYSPARK_PYTHON=/path/to/anaconda3/bin/python
export PYSPARK_DRIVER_PYTHON=/path/to/anaconda3/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"

pyspark "$@"

Replace the /path/to ... with the path where you have installed your python and jupyter binaries respectively.

Most probably this step is already done, but just in case
Modify your ~/.bashrc file by adding the following lines

    # Spark
    export PATH="/path/to/spark/bin:/path/to/spark/sbin:$PATH"
    export SPARK_HOME="/path/to/spark"
    export SPARK_CONF_DIR="/path/to/spark/conf"

Run source ~/.bashrc and you are set.

Go ahead and try start_pyspark.sh.
You could also give arguments to the script, something like start_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14.

Hope it works out for you.

Categories

apache spark - How do I run pyspark with jupyter notebook?

apache spark - How do I run pyspark with jupyter notebook?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags