Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.8k views
in Technique[技术] by (71.8m points)

excel - python pandas read_excel returns UnicodeDecodeError on describe()

I love pandas, but I am having real problems with Unicode errors. read_excel() returns the dreaded Unicode error:

import pandas as pd
df=pd.read_excel('tmp.xlsx',encoding='utf-8')
df.describe()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 259: ordinal not in range(128)

I figured out that the original Excel had   (non-breaking space) at the end of many cells, probably to avoid conversion of long digit strings to float.

One way around this is to strip the cells, but there must be something better.

for col in df.columns:
    df[col]=df[col].str.strip()

I am using anaconda2.2.0 win64, with pandas 0.16

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Try this method suggested here:

df=pd.read_excel('tmp.xlsx',encoding=sys.getfilesystemencoding())

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...