I have as input a set of files formatted as a single JSON object per line. The problem, however, is that one field on these JSON objects is a JSON-escaped String. Example
{
"id":1,
"name":"some name",
"problem_field": "{"height":180,"weight":80,}",
}
Expectedly, when using sqlContext.read.json
it will create a DataFrame with with the 3 columns id, name and problem_field where problem_field is a String.
I have no control over the input files and I'd prefer to be able to solve this problem within Spark so, Is there any way where I can get Spark to read that String field as JSON and to infer its schema properly?
Note: the json above is just a toy example, the problem_field in my case would have variable different fields and it would be great for Spark to infer these fields and me not having to make any assumptions about what fields exist.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…