Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
620 views
in Technique[技术] by (71.8m points)

python - How to save a file on the cluster

I'm connected to the cluster using ssh and I send the program to the cluster using

spark-submit --master yarn myProgram.py

I want to save the result in a text file and I tried using the following lines:

counts.write.json("hdfs://home/myDir/text_file.txt")
counts.write.csv("hdfs://home/myDir/text_file.csv")

However, none of them work. The program finishes and I cannot find the text file in myDir. Do you have any idea how can I do this?

Also, is there a way to write directly to my local machine?

EDIT: I found out that home directory doesn't exist so now I save the result as: counts.write.json("hdfs:///user/username/text_file.txt") But this creates a directory named text_file.txt and inside I have a lot of files with partial results inside. But I want one file with the final result inside. Any ideas how I can do this ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Spark will save the results in multiple files since the computation is distributed. Therefore writing:

counts.write.csv("hdfs://home/myDir/text_file.csv")

means to save the data on each partition as a separate file in the folder text_file.csv. If you want the data saved as a single file, use coalesce(1) first:

counts.coalesce(1).write.csv("hdfs://home/myDir/text_file.csv")

This will put all the data into a single partition and the number of saved files will thus be 1. However, this could be a bad idea if you have a lot of data. If the data is very small then using collect() is an alternative. This will put all data onto the driver machine as an array, which can then be saved as a single file.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...