Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
439 views
in Technique[技术] by (71.8m points)

python - UnicodeEncodeError:'ascii'编解码器无法在位置20编码字符u'\ xa0':序数不在范围内(128)(UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128))

I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites).

(我在处理从不同网页(在不同站点上)获取的文本中的unicode字符时遇到问题。)

I am using BeautifulSoup.

(我正在使用BeautifulSoup。)

The problem is that the error is not always reproducible;

(问题是错误并非总是可重现的。)

it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError .

(它有时可以在某些页面上使用,有时它会通过抛出UnicodeEncodeError来阻止。)

I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error.

(我已经尝试了几乎所有我能想到的东西,但是没有发现任何能持续工作而又不会引发某种与Unicode相关的错误的东西。)

One of the sections of code that is causing problems is shown below:

(导致问题的代码部分之一如下所示:)

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

Here is a stack trace produced on SOME strings when the snippet above is run:

(这是运行上述代码段时在某些字符串上生成的堆栈跟踪:)

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128)

I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded.

(我怀疑这是因为某些页面(或更具体地说,来自某些站点的页面)可能已编码,而其他页面可能未编码。)

All the sites are based in the UK and provide data meant for UK consumption - so there are no issues relating to internalization or dealing with text written in anything other than English.

(所有站点都位于英国,并提供供英国消费的数据-因此,与英语以外的其他任何形式的内部化或文字处理都没有问题。)

Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem?

(是否有人对如何解决此问题有任何想法,以便我可以始终如一地解决此问题?)

  ask by Homunculus Reticulli translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You need to read the Python Unicode HOWTO .

(您需要阅读Python Unicode HOWTO 。)

This error is the very first example .

(这个错误是第一个例子 。)

Basically, stop using str to convert from unicode to encoded text / bytes.

(基本上,停止使用str从unicode转换为编码的文本/字节。)

Instead, properly use .encode() to encode the string:

(相反,请正确使用.encode()对字符串进行编码:)

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

or work entirely in unicode.

(或完全以unicode工作。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...