Why do I have this problem?
Make sure you check errno
and the return value of printf
!
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
int main(void)
{
wchar_t *s;
s = (wchar_t *) malloc(sizeof(wchar_t) * 2);
s[0] = 0xC389;
s[1] = 0;
if (printf("%ls
", s) < 0) {
perror("printf");
}
free(s);
return (0);
}
See the output:
$ gcc test.c && ./a.out
printf: Invalid or incomplete multibyte or wide character
How to fix
First of all, the default locale of a C program is C
(also known as POSIX
) which is ASCII-only. You will need to add a call to setlocale
, specifically setlocale(LC_ALL,"")
.
If your LC_ALL
, LC_CTYPE
or LANG
environment variables are not set to allow UTF-8 when blank, you'll have to explicitly select a locale. setlocale(LC_ALL, "C.UTF-8")
works on most systems - C
is standard, and the UTF-8
subset of C
is generally implemented.
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
wchar_t *s;
s = (wchar_t *) malloc(sizeof(wchar_t) * 2);
s[0] = 0xC389;
s[1] = 0;
setlocale(LC_ALL, "");
if (printf("%ls
", s) < 0) {
perror("printf");
}
free(s);
return (0);
}
See the output:
$ gcc test.c && ./a.out
?
The reason why the incorrect character printed out is because wchar_t
represents a wide character (such as UTF-32), not a multibyte character (such as UTF-8). Note that wchar_t
is always 32 bits wide in the GNU C Library, but the C standard doesn't require it to be. If you initialize the character using the UTF-32BE
encoding (i.e. 0x000000C9
), then it prints out correctly:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
wchar_t *s;
s = (wchar_t *) malloc(sizeof(wchar_t) * 2);
s[0] = 0xC9;
s[1] = 0;
setlocale(LC_ALL, "");
if (printf("%ls
", s) < 0) {
perror("printf");
}
free(s);
return (0);
}
Output:
$ gcc test.c && ./a.out
é
Note that you can also set the LC
(locale) environment variables via command line:
$ LC_ALL=C.UTF-8
$ ./a.out
é
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…