utf 8 - UTF-8 CJK characters not displaying in Java

Question

Welcome To Ask or Share your Answers For Others

utf 8 - UTF-8 CJK characters not displaying in Java

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

utf 8 - UTF-8 CJK characters not displaying in Java

I've been reading up on Unicode and UTF-8 encoding for a while and I think I understand it, so hopefully this won't be a stupid question:

I have a file which contains some CJK characters, and which has been saved as UTF-8. I have various Asian language packs installed and the characters are rendered properly by other applications, so I know that much works.

In my Java app, I read the file as follows:

// Create objects
fis = new FileInputStream(new File("xyz.sgf"));
InputStreamReader is = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(is);

// Read and display file contents
StringBuffer sb = new StringBuffer();
String line;
while ((line = br.readLine()) != null) {
    sb.append(line);
}
System.out.println(sb);

The output shows the CJK characters as '???'. A call to is.getEncoding() confirms that it is definitely using UTF-8. What step am I missing to make the characters appear properly? If it makes a difference, I'm looking at the output using the Eclipse console.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:55:55+0000

System.out.println(sb);

The problem is the above line. This will encode character data using the default system encoding and emit the data to STDOUT. On many systems, this is a lossy process.

If you change the defaults, the encoding used by System.out and the encoding used by the console must match.

The only supported mechanism to change the default system encoding is via the operating system. (Some will advise using the file.encoding system property, but this is not supported and may have unintended side-effects.) You can use setOut to your own custom PrintStream:

PrintStream stdout = new PrintStream(System.out, autoFlush, encoding);

You can change the Eclipse console encoding via the Run configuration.

You can find a number of posts about the subject on my blog - via my profile.

Categories

utf 8 - UTF-8 CJK characters not displaying in Java

utf 8 - UTF-8 CJK characters not displaying in Java

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags