I am trying to replace emoji from Arabic tweets using java.
I used this code:
String line = "???? ????? ??? ???????? ????? ??? ??? ?? ??? ???? ????";
Pattern unicodeOutliers = Pattern.compile("([u1F601-u1F64F])", Pattern.UNICODE_CASE | Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE);
Matcher unicodeOutlierMatcher = unicodeOutliers.matcher(line);
line = unicodeOutlierMatcher.replaceAll(" $1 ");
But it is not replacing them. Even if I am matching only the character itself "u1F602" it is not replacing it. May be because it is 5 digits after the u?! I am not sure, just a guess.
Note that:
1- the emotion at the end of the tweet (??) is the "U+1F602" which is "face with tears of joy"
2- this question is not a duplicate for this question.
Any Ideas?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…