regex - Replacing Emoji Unicode Range from Arabic Tweets using Java

Question

Welcome To Ask or Share your Answers For Others

regex - Replacing Emoji Unicode Range from Arabic Tweets using Java

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - Replacing Emoji Unicode Range from Arabic Tweets using Java

I am trying to replace emoji from Arabic tweets using java.

I used this code:

String line = "???? ????? ??? ???????? ????? ??? ??? ?? ??? ???? ????";
Pattern unicodeOutliers = Pattern.compile("([u1F601-u1F64F])", Pattern.UNICODE_CASE | Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE);
Matcher unicodeOutlierMatcher = unicodeOutliers.matcher(line);
line = unicodeOutlierMatcher.replaceAll(" $1 ");

But it is not replacing them. Even if I am matching only the character itself "u1F602" it is not replacing it. May be because it is 5 digits after the u?! I am not sure, just a guess.

Note that:

1- the emotion at the end of the tweet (??) is the "U+1F602" which is "face with tears of joy"

2- this question is not a duplicate for this question.

Any Ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:18:22+0000

From the Javadoc for the Pattern class

A Unicode character can also be represented in a regular-expression by using its Hex notation(hexadecimal code point value) directly as described in construct x{...}, for example a supplementary character U+2011F can be specified as x{2011F}, instead of two consecutive Unicode escape sequences of the surrogate pair uD840uDD1F.

This means that the regular expression that you're looking for is ([x{1F601}-x{1F64F}]). Of course, when you write this as a Java String literal, you must escape the backslashes.

Pattern unicodeOutliers = Pattern.compile("([\x{1F601}-\x{1F64F}])");

Note that the construct x{...} is only available from Java 7.

Categories

regex - Replacing Emoji Unicode Range from Arabic Tweets using Java

regex - Replacing Emoji Unicode Range from Arabic Tweets using Java

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags