I am using Spark Structured Streaming's streaming query to write parquet files to S3 using the following code:
ds.writeStream().format("parquet").outputMode(OutputMode.Append())
.option("queryName", "myStreamingQuery")
.option("checkpointLocation", "s3a://my-kafka-offset-bucket-name/")
.option("path", "s3a://my-data-output-bucket-name/")
.partitionBy("createdat")
.start();
I get the desired output in the s3 bucket my-data-output-bucket-name
but along with the output, I get the _spark_metadata
folder in it. How to get rid of it? If I can't get rid of it, how to change it's location to a different S3 bucket?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…