I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.
print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)
Output is (changing angle brackets to square brackets for readability):
sys.stdout encoding is "cp1252"
Traceback (most recent call last):
File "TestPrintEncoding.py", line 22, in [module]
print(str1)
File "C:Python30libio.py", line 1491, in write
b = encoder.encode(s)
File "C:Python30libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'u0101'
in position 4: character maps to [undefined]
Note that ü = xfc = 252 gives no problem since it's upper ASCII. But ā = u0101 is beyond 8-bits.
Anyone have an idea how to change the encoding of sys.stdout to 'utf-8'? Bear in mind that Python 3.0 no longer uses the codecs
module, if I understand the documentation right.
Apologies, I gave you the program without the preamble. Before the 3 lines given, it starts like this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
Unfortunately, the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…