In the following Java program running in Linux using OpenJDK 1.6.0_22 I simply list the contents of the directory taken in as parameter at the command line. The directory contains the files which have file names in UTF-8 (e.g. Hindi, Mandarin, German etc.).
import java.io.*;
class ListDir {
public static void main(String[] args) throws Exception {
//System.setProperty("file.encoding", "en_US.UTF-8");
System.out.println(System.getProperty("file.encoding"));
File f = new File(args[0]);
for(String c : f.list()) {
String absPath = args[0] + "" + c;
File cf = new File(args[0] + "/" + c);
System.out.println(cf.getAbsolutePath() + " --> " + cf.exists());
}
}
}
If I set the LC_ALL variable to en_US.UTF-8 the results are printed fine. But if I set the LC_ALL variable to POSIX and supply the file.encoding and sun.jnu.encoding properties as UTF-8 from command line I get the garbage output and cf.exists() returns false.
Can you please explain this behavior. As I read on so many websites file.encoding is said to be sufficient to read file names and use them for operations. Here it looks like that property has no effect at all.
Update 1: If I set file.encoding to something like GBK (Chinese) and LC_ALL variable to en_US.UTF-8 then cf.exists() returns true. only the '?' appears instead of file name. Surprise o_O.
Update 2: More investigation and it looks like its not a Java issue. It looks like libc on Linux used locale settings to translate file name encodings and those settings will cause file not found error/exception. "file.encoding" is for how Java interprets file names.
Update 3 Now it looks problem is how Java interprets file names. The following simple C code works on Linux regardless of file encoding and value of LC_ALL environment variable (I am happy that this proves for answer given here: https://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding). But still I am not clear how Java interprets on LC_ALL variable. Now looking into OpenJDK code for that.
Sample C code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>
int main(int argc, char *argv[])
{
char *argdir = argv[1];
DIR *dp = opendir(argdir);
struct dirent *de;
while(de = readdir(dp)) {
char *abspath = (char *) malloc(strlen(argdir) + 1 + strlen(de->d_name) + 1);
strcpy(abspath, argdir);
abspath[strlen(argdir)] = '/';
strcpy(abspath + strlen(argdir) + 1, de->d_name);
printf("%d %s ", de->d_type, abspath);
FILE *fp = fopen(abspath, "r");
if (fp) {
printf("Success");
}
fclose(fp);
putchar('
');
}
}
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…