scala - How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

Question

Welcome To Ask or Share your Answers For Others

scala - How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

I have as input a set of files formatted as a single JSON object per line. The problem, however, is that one field on these JSON objects is a JSON-escaped String. Example

{
  "id":1,
  "name":"some name",
  "problem_field": "{"height":180,"weight":80,}",
}

Expectedly, when using sqlContext.read.json it will create a DataFrame with with the 3 columns id, name and problem_field where problem_field is a String.

I have no control over the input files and I'd prefer to be able to solve this problem within Spark so, Is there any way where I can get Spark to read that String field as JSON and to infer its schema properly?

Note: the json above is just a toy example, the problem_field in my case would have variable different fields and it would be great for Spark to infer these fields and me not having to make any assumptions about what fields exist.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:31:54+0000

Would that be acceptable solution?

val sc: SparkContext = ...
val sqlContext = new SQLContext(sc)

val escapedJsons: RDD[String] = sc.parallelize(Seq("""{"id":1,"name":"some name","problem_field":"{"height":180,"weight":80}"}"""))
val unescapedJsons: RDD[String] = escapedJsons.map(_.replace(""{", "{").replace(""}", "}").replace(""", """))
val dfJsons: DataFrame = sqlContext.read.json(unescapedJsons)

dfJsons.printSchema()

// Output
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- problem_field: struct (nullable = true)
|    |-- height: long (nullable = true)
|    |-- weight: long (nullable = true)

Categories

scala - How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

scala - How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags