'Remove character '\u202A' 8234 from string

I am trying to get character at 0 index in a string:

public static String editNoHP (String noHP){
  String result;
  try {
      if(noHP.charAt(0) == '0')
          result = "62"+noHP.substring(1);
      else if(noHP.charAt(0) == '+' )
          result = noHP.substring(1);
      else if(noHP.charAt(0) == '6')
          result = noHP;
      else if(noHP.charAt(0) == '6' && noHP.charAt(1) == '2')
          result = noHP;
      else if(noHP.charAt(0) == '9')
          result = noHP;
      else
          result = "62"+noHP;
  }
  catch (Exception e){
      return "";
  }

  return result.replaceAll("[\\s\\-\\.\\^:,]","");
}

So I use this function after I query contact, but I found strange result.

Normal input & output:

input = +62 111-1111-1111   output : 6211111111111
input = 011111111111        output : 6211111111111

And this the strange input and result:

input = 011111111111        output : 62011111111111

So I try to debug this account and I found when app try to get character at 0 the return is '\u202A' 8234, not 0.

I already tried RegEx like:

String clean = str.replaceAll("[^\\n\\r\\t\\p{Print}]", ""); or
String clean = str.replaceAll("[^\\x20-\\x7E]", ""); or
String clean = str.replaceAll("[^\u0000-\uFFFF]", ""); or
String clean = str.replaceAll("[^\\p{ASCII}]", ""); or
String clean = str.replaceAll("[^\x00-\x7F]", ""); or
String clean = StringEscapeUtils.unescapeJava(str);

All of them return same value '\u202A' 8234.

What is this character? How to fix this problem?

Update : I try to edit the strange contact and I found strange behaviour. The number is 011111111111. First I put cursor between 0 and 1, then I press delete/backspace to remove 0. The cursor suddenly move to the right of number 1 not on the left. Then I save the contact and run my program. The result is 0, not '\u202A' 8234. So I think this because the format of number not normal, maybe when first time add this contact or when sync from the google account.

Solution 1:^[1]

Finally, I found out that I can use regex to replace non-Alphanumeric characters.

So this is my finale function :

public static String editNoHP (String noHPinput){
    String result;
    try {
        noHPinput = noHPinput.trim();
        String noHP = noHPinput;
        noHP = noHP.replaceAll("[\\s\\-\\.\\^:,]","");
        noHP = noHP.replaceAll("[^A-Za-z0-9]","");
        char isinya = noHP.charAt(0);

        if(isinya == '0')
            result = "62"+noHP.substring(1);
        else if(isinya == '+' )
            result = noHP.substring(1);
        else
            result = noHP;

    }
    catch (Exception e){
        return "";
    }

    return result;
}

This regex remove all unicode characters beside Alphanumeric characters.

Solution 2:^[2]

I came across the same issue!! Took me several hours to debug, because when I print out the string, the first character was shown as '?', so I thought it is a question mark. But it isn't!

Then I printed out the numerical value of the first character, and it is 8234! I was like, wtf. Completely have no idea why it shows as a question mark.

Solution 3:^[3]

according to http://unicode.org/cldr/utility/character.jsp?a=202A&B1=Show \u202A is kind of a whitespace. In order to fix it just trim the string.

public static String editNoHP (String noHP){
     noHP = noHP.trim();
     // the rest of your code...
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Yuddistira Kiki
Solution 2	Yang
Solution 3	Yamen Nassif

'Remove character '\u202A' 8234 from string

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]