Note that the error is a UnicodeEncodeError
rather than a UnicodeDecodeError
. Python is preserving the exact bytes passed on the command line (via the PEP 383 surrogateescape
error handler), but those bytes are not valid UTF-8 and hence can't be encoded as such for writing to the console.
The best way to deal with this is to use the application level knowledge of the correct encoding to reinterpret the command line argument inside the application, as in the following example code:
$ python3.2 -c "import os, sys; print(os.fsencode(sys.argv[1]).decode('latin-1'))" `echo fran?ais|iconv -t latin1`
fran?ais
The os.fsencode
function invocation reverses the transformation Python applied automatically when processing the command line arguments. The decode('latin-1')
method invocation then performs the correct conversion in order to get a properly decoded string.
Python 3.2 added os.fsencode
to specifically to make this kind of problem easier to deal with.
For Python 3.1, the equivalent construct for os.fsencode(sys.argv[1])
is sys.argv[1].encode(sys.getfilesystemencoding(), 'surrogateescape')
Edit Feb 2013: updated for Python 3.2+, and to avoid assuming that Python autodetected "UTF-8" as the command line encoding
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…