I'm trying to read some excel data into Pyspark Dataframe.
I'm using the library: 'com.crealytics:spark-excel_2.11:0.11.1'.
I don't have a header in my data.
I'm able to read successfully when reading from column A onwards, but when I'm trying to read from two columns down the line - like [N,O], I get a Dataframe with all nulls.
My data is as below:
e.g , When reading from A2:B4, I get the correct Dataframe:
| _c0| _c1|
But using the same code, just changing 'dataAddress' to N2:O4, I get Dataframe with nulls:
| _c0| _c1|
My code:
from pyspark.sql import SparkSession
from com.crealytics.spark.excel import *
spark = SparkSession.builder.appName("excel_try").enableHiveSupport().getOrCreate()
exldf = spark.read.format("com.crealytics.spark.excel")
Run using:
spark-submit --master yarn --packages com.crealytics:spark-excel_2.11:0.11.1 excel_false.py
Can someone please help with a solution?