I have CSV file with 10 columns. Half String and half are Integers.
What is the Scala code to:
- Create (infer) the schema
- Save that schema to a file
I have this so far:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load("cars.csv")
And what is the best file format for saving that schema? Is it JSON?
Goal is - I want to create schema only once and next time load from a file instead of re-creating it on a fly.
Thanks.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…