I am using the Python interpreter in Windows 7 terminal.
I am trying to wrap my head around unicode and encodings.
I type:
>>> s='?'
>>> s
'x89'
>>> u=u'?'
>>> u
u'xeb'
Question 1: Why is the encoding used in the string s
different from the one used in the unicode string u
?
I continue, and type:
>>> us=unicode(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x89 in position 0: ordinal
not in range(128)
>>> us=unicode(s, 'latin-1')
>>> us
u'x89'
Question2: I tried using the latin-1
encoding on good luck to turn the string into an unicode string (actually, I tried a bunch of other ones first, including utf-8
). How can I find out which encoding the terminal has used to encode my string?
Question 3: how can I make the terminal print ?
as ?
instead of 'x89'
or u'xeb'
? Hmm, stupid me. print(s)
does the job.
I already looked at this related SO question, but no clues from there: Set Python terminal encoding on Windows
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…