Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
501 views
in Technique[技术] by (71.8m points)

regex - How do I replace characters not in range [0x5E10, 0x7F35] with '*' in PHP?

I'm not familiar with the how regular expressions treat hexadecimal, anyone knows?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The following does the trick:

$str = "some ??????????";

echo preg_replace('/[x{00ff}-x{ffff}]/u', '*', $str);
// some **********

echo preg_replace('/[^x{00ff}-x{ffff}]/u', '*', $str);
// *****??????????

The important thing is the u-modifier (see here):

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

And here a short description why uFFFF is not working in PHP:

Perl and PCRE do not support the uFFFF syntax. They use x{FFFF} instead. You can omit leading zeros in the hexadecimal number between the curly braces. Since x by itself is not a valid regex token, x{1234} can never be confused to match x 1234 times. It always matches the Unicode code point U+1234. x{1234}{5678} will try to match code point U+1234 exactly 5678 times.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...