apache spark - Removing Characters from python Output

Question

Welcome To Ask or Share your Answers For Others

apache spark - Removing Characters from python Output

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

apache spark - Removing Characters from python Output

I did alot of work to remove the characters from the spark python output like u u' u" [()/'" which are creating problem for me to do the further work. So please put a focus on the same .

I have the input like,

(u"(u'[25145,   12345678'", 0.0)
(u"(u'[25146,   25487963'", 43.0) when i applied code to summing out the result. this gives me the output like
(u'(u"(u'[54879,    5125478'"', 0.0)
(u"(u'[25145,   25145879'", 11.0)
(u'(u"(u'[56897,    22548793'"', 0.0) so i want to remove all the character like (u'(u"(u'["'')

I want output like

54879,5125478,0.0

25145,25145879,11.0

the code is i tried is

from pyspark import SparkContext
import os
import sys

sc = SparkContext("local", "aggregate")

file1 = sc.textFile("hdfs://localhost:9000/data/first/part-00000")
file2 = sc.textFile("hdfs://localhost:9000/data/second/part-00000")

file3 = file1.union(file2).coalesce(1).map(lambda line: line.split(','))

result = file3.map(lambda x: ((x[0]+', '+x[1],float(x[2][:-1])))).reduceByKey(lambda a,b:a+b).coalesce(1)

result.saveAsTextFile("hdfs://localhost:9000/Test1")

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:26:43+0000

I think your only problem is that you have to reformat you result before saving it to the file, i.e. something like:

result.map(lambda x:x[0]+','+str(x[1])).saveAsTextFile("hdfs://localhost:9000/Test1")

Categories

apache spark - Removing Characters from python Output

apache spark - Removing Characters from python Output

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags