I launch pyspark applications from pycharm on my own workstation, to a 8 node cluster. This cluster also has settings encoded in spark-defaults.conf and spark-env.sh
This is how I obtain my spark context variable.
spark = SparkSession
.builder
.master("spark://stcpgrnlp06p.options-it.com:7087")
.appName(__SPARK_APP_NAME__)
.config("spark.executor.memory", "50g")
.config("spark.eventlog.enabled", "true")
.config("spark.eventlog.dir", r"/net/share/grid/bin/spark/UAT/SparkLogs/")
.config("spark.cores.max", 128)
.config("spark.sql.crossJoin.enabled", "True")
.config("spark.executor.extraLibraryPath","/net/share/grid/bin/spark/UAT/bin/vertica-jdbc-8.0.0-0.jar")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.logConf", "true")
.getOrCreate()
sc = spark.sparkContext
sc.setLogLevel("INFO")
I want to see the effective config that is being used in my log. This line
.config("spark.logConf", "true")
should cause the spark api to log its effective config to the log as INFO, but the default log level is set to WARN, and as such I don't see any messages.
setting this line
sc.setLogLevel("INFO")
shows INFO messages going forward, but its too late by then.
How can I set the default logging level that spark starts with?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…