Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

excel - Python pandas to_excel 'utf8' codec can't decode byte

I'm trying to do some data work in Python pandas and having trouble writing out my results. I read my data in as a CSV file and been exporting each script as it's own CSV file which works fine. Lately though I've tried exporting everything in 1 Excel file with worksheets and a few of the sheets give me an error

"'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte"

I have no idea how to even start finding any characters that could be causing problems exporting to Excel. Not sure why it exports to CSV just fine though :(

relevant lines

from pandas import ExcelWriter
data = pd.read_csv(input)
writer = ExcelWriter(output) #output is just the filename
fundraisers.to_excel(writer, "fundraisers")
locations.to_excel(writer, "locations") #error
locations.to_csv(outputcsv) #works
writer.save()

printing head of offending dataframe

Event ID    Constituent ID  Email Address   First Name     Last Name
f       1       A       A       1
F       4       L       R       C
M       1       1       A       D
F       4       A       A       G
M       2       0       R       G
M       3       O       O       H
M       2       T       E       H
M       2       A       A       H
M       2       M       M       K
F       3       J       E       K
Location ID raised  raised con  raised email
a   0   0   0
a   8   0   0
o   0   0   0
o   0   0   0
o   0   0   0
t   5   0   0
o   1   0   0
o   6   a   0
o   6   0   0
d   0   0   0

looking at the excel sheet I do actually get a partial print out. Anything in the first name column and beyond are blank, but event, constituent and email all print.

edit: Trying to read the csv in as utf8 fails, but reading it in as latin1 works. Is there a way to specify the to_excel encoding? Or decode and encode my dataframe to utf8?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Managed to solve this.

I made a function that goes through my columns that have strings and managed to decode/encode them into utf8 and it now works.

def changeencode(data, cols):
    for col in cols:
        data[col] = data[col].str.decode('iso-8859-1').str.encode('utf-8')
    return data   

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...