'Remove character '\u202A' 8234 from string
I am trying to get character at 0 index in a string:
public static String editNoHP (String noHP){
String result;
try {
if(noHP.charAt(0) == '0')
result = "62"+noHP.substring(1);
else if(noHP.charAt(0) == '+' )
result = noHP.substring(1);
else if(noHP.charAt(0) == '6')
result = noHP;
else if(noHP.charAt(0) == '6' && noHP.charAt(1) == '2')
result = noHP;
else if(noHP.charAt(0) == '9')
result = noHP;
else
result = "62"+noHP;
}
catch (Exception e){
return "";
}
return result.replaceAll("[\\s\\-\\.\\^:,]","");
}
So I use this function after I query contact, but I found strange result.
Normal input & output:
input = +62 111-1111-1111 output : 6211111111111
input = 011111111111 output : 6211111111111
And this the strange input and result:
input = 011111111111 output : 62011111111111
So I try to debug this account and I found when app try to get character at 0 the return is '\u202A' 8234, not 0.
I already tried RegEx like:
String clean = str.replaceAll("[^\\n\\r\\t\\p{Print}]", ""); or
String clean = str.replaceAll("[^\\x20-\\x7E]", ""); or
String clean = str.replaceAll("[^\u0000-\uFFFF]", ""); or
String clean = str.replaceAll("[^\\p{ASCII}]", ""); or
String clean = str.replaceAll("[^\x00-\x7F]", ""); or
String clean = StringEscapeUtils.unescapeJava(str);
All of them return same value '\u202A' 8234.
What is this character? How to fix this problem?
Update : I try to edit the strange contact and I found strange behaviour. The number is 011111111111. First I put cursor between 0 and 1, then I press delete/backspace to remove 0. The cursor suddenly move to the right of number 1 not on the left. Then I save the contact and run my program. The result is 0, not '\u202A' 8234. So I think this because the format of number not normal, maybe when first time add this contact or when sync from the google account.
Solution 1:[1]
Finally, I found out that I can use regex to replace non-Alphanumeric characters.
So this is my finale function :
public static String editNoHP (String noHPinput){
String result;
try {
noHPinput = noHPinput.trim();
String noHP = noHPinput;
noHP = noHP.replaceAll("[\\s\\-\\.\\^:,]","");
noHP = noHP.replaceAll("[^A-Za-z0-9]","");
char isinya = noHP.charAt(0);
if(isinya == '0')
result = "62"+noHP.substring(1);
else if(isinya == '+' )
result = noHP.substring(1);
else
result = noHP;
}
catch (Exception e){
return "";
}
return result;
}
This regex remove all unicode characters beside Alphanumeric characters.
Solution 2:[2]
I came across the same issue!! Took me several hours to debug, because when I print out the string, the first character was shown as '?', so I thought it is a question mark. But it isn't!
Then I printed out the numerical value of the first character, and it is 8234! I was like, wtf. Completely have no idea why it shows as a question mark.
Solution 3:[3]
according to http://unicode.org/cldr/utility/character.jsp?a=202A&B1=Show \u202A is kind of a whitespace.
In order to fix it just trim the string.
public static String editNoHP (String noHP){
noHP = noHP.trim();
// the rest of your code...
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Yuddistira Kiki |
| Solution 2 | Yang |
| Solution 3 | Yamen Nassif |
