Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
783 views
in Technique[技术] by (71.8m points)

regex - How to convert text to unicode code point like u0054u0068u0069u0073 using php?

EDIT 2: I'd like to convert English words to unicode numbers using php5 and then produced as u* * * * where * * * * is the unicode number.

In my original question, I had mistakenly thought that u was a standard for encoding unicode when in fact it is just being escaped in JavaScript ( Thankyou Jukka K. Korpela for pointing this out). Even though I wanted to do the conversion in PHP the converted unicode was to be used in JavaScript.

I tried the below options, but had no luck. deceze's answer did the trick though, thank you very much!

THINGS I TRIED

I've read that I can use iconv to do this, but I've had no luck and can't find any examples on how.

I've also tried Scott Reynen's code here How to get code point number for a given character in a utf-8 string? but I can't seem to get it to work. When I tried it I included the script in a file along with

$str='test';
echo utf8_to_unicode($str);

It just echoed out test.

I've also read that I can use

echo json_encode("test");

but again I only get test printed to the screen.

Any help would be much appreciated.

EDIT1: Actually I think they are called code units not code points.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

json_encode pretty much does it for you, but only for non-ASCII characters. So all you need to do is to convert ASCII characters by hand. Here's a function that does that on a character-by-character basis:

function utf8ToUnicodeCodePoints($str) {
    if (!mb_check_encoding($str, 'UTF-8')) {
        trigger_error('$str is not encoded in UTF-8, I cannot work like this');
        return false;
    }
    return preg_replace_callback('/./u', function ($m) {
        $ord = ord($m[0]);
        if ($ord <= 127) {
            return sprintf('u%04x', $ord);
        } else {
            return trim(json_encode($m[0]), '"');
        }
    }, $str);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...