Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
947 views
in Technique[技术] by (71.8m points)

mysql - How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.

So, in PHP, how can I get rid of all 4(-and-more)-byte characters in a string and replace them with something like by some other character?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS:

http://unicode.org/reports/tr36/#Deletion_of_Noncharacters

preg_replace('/[x{10000}-x{10FFFF}]/u', "xEFxBFxBD", $value);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...