Define schema for your JSON messages.
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField("Id", DataTypes.IntegerType, false),
DataTypes.createStructField("Name", DataTypes.StringType, false),
DataTypes.createStructField("DOB", DataTypes.DateType, false) });
Now read Messages like below. MessageData is JavaBean for your JSON message.
Dataset<MessageData> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", kafkaUri)
.option("subscribe", "Statistics")
.option("startingOffsets", "earliest")
.load()
.selectExpr("CAST(value AS STRING) as message")
.select(functions.from_json(functions.col("message"),schema).as("json"))
.select("json.*")
.as(Encoders.bean(MessageData.class));
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…