'How to check if a String contains only ASCII?

The call Character.isLetter(c) returns true if the character is a letter. But is there a way to quickly find if a String only contains the base characters of ASCII?



Solution 1:[1]

You can do it with java.nio.charset.Charset.

import java.nio.charset.Charset;

public class StringUtils {

  public static boolean isPureAscii(String v) {
    return Charset.forName("US-ASCII").newEncoder().canEncode(v);
    // or "ISO-8859-1" for ISO Latin 1
    // or StandardCharsets.US_ASCII with JDK1.7+
  }

  public static void main (String args[])
    throws Exception {

     String test = "Réal";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
     test = "Real";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));

     /*
      * output :
      *   Réal isPureAscii() : false
      *   Real isPureAscii() : true
      */
  }
}

Detect non-ASCII character in a String

Solution 2:[2]

Here is another way not depending on a library but using a regex.

You can use this single line:

text.matches("\\A\\p{ASCII}*\\z")

Whole example program:

public class Main {
    public static void main(String[] args) {
        char nonAscii = 0x00FF;
        String asciiText = "Hello";
        String nonAsciiText = "Buy: " + nonAscii;
        System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
        System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
    }
}

Understanding the regex :

  • li \\A : Beginning of input
  • \\p{ASCII} : Any ASCII character
  • * : all repetitions
  • \\z : End of input

Solution 3:[3]

Iterate through the string and make sure all the characters have a value less than 128.

Java Strings are conceptually encoded as UTF-16. In UTF-16, the ASCII character set is encoded as the values 0 - 127 and the encoding for any non ASCII character (which may consist of more than one Java char) is guaranteed not to include the numbers 0 - 127

Solution 4:[4]

Or you copy the code from the IDN class.

// to check if a string only contains US-ASCII code point
//
private static boolean isAllASCII(String input) {
    boolean isASCII = true;
    for (int i = 0; i < input.length(); i++) {
        int c = input.charAt(i);
        if (c > 0x7F) {
            isASCII = false;
            break;
        }
    }
    return isASCII;
}

Solution 5:[5]

commons-lang3 from Apache contains valuable utility/convenience methods for all kinds of 'problems', including this one.

System.out.println(StringUtils.isAsciiPrintable("!@£$%^&!@£$%^"));

Solution 6:[6]

try this:

for (char c: string.toCharArray()){
  if (((int)c)>127){
    return false;
  } 
}
return true;

Solution 7:[7]

This will return true if String only contains ASCII characters and false when it does not

Charset.forName("US-ASCII").newEncoder().canEncode(str)

If You want to remove non ASCII , here is the snippet:

if(!Charset.forName("US-ASCII").newEncoder().canEncode(str)) {
                        str = str.replaceAll("[^\\p{ASCII}]", "");
                    }

Solution 8:[8]

In Kotlin:

fun String.isAsciiString() : Boolean =
    this.toCharArray().none { it < ' ' || it > '~' }

Solution 9:[9]

Iterate through the string, and use charAt() to get the char. Then treat it as an int, and see if it has a unicode value (a superset of ASCII) which you like.

Break at the first you don't like.

Solution 10:[10]

private static boolean isASCII(String s) 
{
    for (int i = 0; i < s.length(); i++) 
        if (s.charAt(i) > 127) 
            return false;
    return true;
}

Solution 11:[11]

In Java 8 and above, one can use String#codePoints in conjunction with IntStream#allMatch.

boolean allASCII = str.codePoints().allMatch(c -> c < 128);

Solution 12:[12]

It was possible. Pretty problem.

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;

public class EncodingTest {

    static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII")
            .newEncoder();

    public static void main(String[] args) {

        String testStr = "¤EÀsÆW°ê»Ú®i¶T¤¤¤ß3¼Ó®i¶TÆU2~~KITEC 3/F Rotunda 2";
        String[] strArr = testStr.split("~~", 2);
        int count = 0;
        boolean encodeFlag = false;

        do {
            encodeFlag = asciiEncoderTest(strArr[count]);
            System.out.println(encodeFlag);
            count++;
        } while (count < strArr.length);
    }

    public static boolean asciiEncoderTest(String test) {
        boolean encodeFlag = false;
        try {
            encodeFlag = asciiEncoder.canEncode(new String(test
                    .getBytes("ISO8859_1"), "BIG5"));
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        return encodeFlag;
    }
}

Solution 13:[13]

//return is uppercase or lowercase
public boolean isASCIILetter(char c) {
  return (c > 64 && c < 91) || (c > 96 && c < 123);
}