For CGI, using print()
requires that the correct codec has been set up for output. print()
writes to sys.stdout
and sys.stdout
has been opened with a specific encoding and how that is determined is platform dependent and can differ based on how the script is run. Running your script as a CGI script means you pretty much do not know what encoding will be used.
In your case, the web server has set the locale for text output to a fixed encoding other than UTF-8. Python uses that locale setting to produce output in in that encoding, and without the <meta>
header your browser correctly guesses that encoding (or the server has communicated it in the Content-Type header), but with the <meta>
header you are telling it to use a different encoding, one that is incorrect for the data produced.
You can write directly to sys.stdout.buffer
, after explicitly encoding to UTF-8. Make a helper function to make this easier:
import sys
def enc_print(string='', encoding='utf8'):
sys.stdout.buffer.write(string.encode(encoding) + b'
')
enc_print("Content-type:text/html")
enc_print()
enc_print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
?????ü
</body>
</html>
""")
Another approach is to replace sys.stdout
with a new io.TextIOWrapper()
object that uses the codec you need:
import sys
import io
def set_output_encoding(codec, errors='strict'):
sys.stdout = io.TextIOWrapper(
sys.stdout.detach(), errors=errors,
line_buffering=sys.stdout.line_buffering)
set_output_encoding('utf8')
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
?????ü
</body>
</html>
""")
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…