Just urlencode
the string desired as a filename. All characters returned from urlencode
are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode
the filenames back to UTF-8 (or whatever encoding they were in).
Caveats (all apply to the solutions below as well):
- After url-encoding, the filename must be less that 255 characters (probably bytes).
- UTF-8 has multiple representations for many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with
glob
or reopening an individual file.
- You can't rely on
scandir
or similar functions for alpha-sorting. You must urldecode
the filenames then use a sorting algorithm aware of UTF-8 (and collations).
Worse Solutions
The following are less attractive solutions, more complicated and with more caveats.
On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:
Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g. ó
will be appear as ?3
in Windows Explorer.
Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through utf8_decode
before using them in filesystem functions, and pass the entries scandir
gives you through utf8_encode
to get the original filenames in UTF-8.
Caveats galore!
- If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you're out of luck.
- Windows may use an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use
mb_convert_encoding
instead of utf8_decode
.
This nightmare is why you should probably just transliterate to create filenames.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…