Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
399 views
in Technique[技术] by (71.8m points)

utf 8 - Ã © and other codes

I got a file full of those codes, and I want to "translate" it into normal chars (a whole file, I mean). How can I do it?

Thank you very much in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Looks like you originally had a UTF-8 file which has been interpreted as an 8 bit encoding (e.g. ISO-8859-15) and entity-encoded. I say this because the sequence C3A9 looks like a pretty plausible UTF-8 encoding sequence.

You will need to first entity-decode it, then you'll have a UTF-8 encoding again. You could then use something like iconv to convert to an encoding of your choosing.

To work through your example:

  • Ã © would be decoded as the byte sequence 0xC3A9
  • 0xC3A9 = 11000011 10101001 in binary
  • the leading 110 in the first octet tells us this could be interpreted as a UTF-8 two byte sequence. As the second octet starts with 10, we're looking at something we can interpret as UTF-8. To do that, we take the last 5 bits of the first octet, and the last 6 bits of the second octet...
  • So, interpreted as UTF8 it's 00011101001 = E9 = é (LATIN SMALL LETTER E WITH ACUTE)

You mention wanting to handle this with PHP, something like this might do it for you:

 //to load from a file, use
 //$file=file_get_contents("/path/to/filename.txt");
 //example below uses a literal string to demonstrate technique...

 $file="&Précédent is a French word";
 $utf8=html_entity_decode($file);
 $iso8859=utf8_decode($utf8);

 //$utf8 contains "Précédent is a French word" in UTF-8
 //$iso8859 contains "Précédent is a French word" in ISO-8859

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...