'How to improve performance for masking method with credit card regex in java

i have this function to identify credit card by regex in input string and mask it without the last 4 digits:

public CharSequence obfuscate(CharSequence data) {
    String[] result = data.toString().replaceAll("[^a-zA-Z0-9-_*]", " ").trim().replaceAll(" +", " ").split(" ");
    for(String str : result){
        String originalString = str;
        String cleanString = str.replaceAll("[-_]","");
        CardType cardType = CardType.detect(cleanString);
        if(!CardType.UNKNOWN.equals(cardType)){
            String maskedReplacement = maskWithoutLast4Digits(cleanString ,replacement);
            data = data.toString().replace(originalString , maskedReplacement);
        }
    }
    return data;
}

static String maskWithoutLast4Digits(String input , String replacement) {
    if(input.length() < 4){
        return input;
    }
    return input.replaceAll(".(?=.{4})", replacement);
}

//pattern enum

 public enum CardType {
UNKNOWN,
VISA("^4[0-9]{12}(?:[0-9]{3}){0,2}$"),
MASTERCARD("^(?:5[1-5]|2(?!2([01]|20)|7(2[1-9]|3))[2-7])\\d{14}$"),
AMERICAN_EXPRESS("^3[47][0-9]{13}$"),
DINERS_CLUB("^3(?:0[0-5]|[68][0-9])[0-9]{11}$"),
DISCOVER("^6(?:011|[45][0-9]{2})[0-9]{12}$");

private Pattern pattern;

CardType() {
    this.pattern = null;
}

CardType(String pattern) {
    this.pattern = Pattern.compile(pattern);
}

public static CardType detect(String cardNumber) {

    for (CardType cardType : CardType.values()) {
        if (null == cardType.pattern) continue;
        if (cardType.pattern.matcher(cardNumber).matches()) return cardType;
    }

    return UNKNOWN;
}


public Pattern getPattern() {
    return pattern;
}
}

input1: "Valid American Express card: 371449635398431".

output1: "Valid American Express card: ***********8431"

input2: "Invalid credit card: 1234222222222" //not mach any credit card pattern

output2: "Invalid credit card: 1234222222222"

input3: "Valid American Express card with garbage characters: <3714-4963-5398-431>"

output: "Valid American Express card with garbage characters: <***********8431>"

this is not the best way to to do the masking since this method will be called for each tag in huge html and each line in huge text files how i can improve the performance of this method



Solution 1:[1]

This Post is solely based on the comments in the Answer above and in particular this comment from the OP:

And also the input string can be "my phone number 12345678 and credit card 1234567890"

If you're bent on RegEx and you want to retrieve a phone number and or a Credit Card number from a specific String then you can use this Java regular expression:

String regex = String regex = "(\\+?\\d+.{0,1}\\d+.{0,1}\\d+.{0,1}\\d+)|"
                            + "(\\+{0,1}\\d+{0,3}\\s{0,1}\\-{0,1}\\({0,1}\\d+"       // Phone Numbers
                            + "\\){0,1}\\s{0,1}\\-{0,1}\\d+\\s{0,1}\\-{0,1}\\d+)";   // Credit Cards

To use this regex string you would want to run it through a Pattern/Matcher mechanism, for example:

String strg = "Valid Phone #: <+1 (212) 555-3456> - "
            + "Valid American Express card 24 with garbage 33.6 characters: <3714-4963-5398-431>";

final java.util.List<String> numbers = new java.util.ArrayList<>();

final String regex = "(\\+?\\d+.{0,1}\\d+.{0,1}\\d+.{0,1}\\d+)|"       // Phone Numbers
                   + "(\\+{0,1}\\d+{0,3}\\s{0,1}\\-{0,1}\\({0,1}\\d+"  // Credit Cards
                   + "\\){0,1}\\s{0,1}\\-{0,1}\\d+\\s{0,1}\\-{0,1}\\d+)";

final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex); // the regex
final java.util.regex.Matcher matcher = pattern.matcher(strg); // your string
while (matcher.find()) { 
    numbers.add(matcher.group()); 
}
        
for (String str : numbers) {
    System.out.println(str);
}

With the above supplied String the Console Window would display:

+1 (212) 555-3456
3714-4963-5398-431

Consider these the original Phone number and Credit-Card number substrings. Place these strings into repsective variables like origPhoneNum and origcreditCardNum. Now validate the numbers. You already have the tool provided to validate a credit card number in the previous answer. And here is one to validate a phone number:

public static boolean isValidPhoneNumber(String phoneNumber) {
    return phoneNumber.matches("^(?!\\b(0)\\1+\\b)(\\+?\\d{1,3}[. -]?)?"
                             + "\\(?\\d{3}\\)?([. -]?)\\d{3}\\3\\d{4}$");
}

I have tested the above provided regex string against phone numbers from many different countries in many different formats with success. It was also tested against many different Credit Card numbers in many different formats, again with success. Never the less there will of course always be some format that may cause a particular problem since there are obviously no rules what-so-ever for number entries at the source of data generation.

Take the comment line I had shown at the top of this post:

And also the input string can be "my phone number 12345678 and credit card 1234567890"

There is no way to distinguish which number is suppose to be a phone number and which is suppose to be a credit card number unless it specifically states as such with text within the string as the above string does. Tomorrow or next week it might not because there just doesn't look like there are any data entry rules in play here.

The string indicates a phone number of 12345678 which is 8 digits. The string also indicates a credit card number of 1234567890. Internationally, phone numbers can range from 9 to as many as 13 digits depending on the country. Locally the number of digits range would be smaller again, depending on the country. Since phone numbers (internationally) have such a great number of digits range there is no way to know that the number deemed to be credit card number is in fact a credit card number unless the string tells you either before the number or after it. Which will it be in the next input string if at all.?

For this I leave it for you to decide how to deal with this situation but whatever it is, don't expect any great speed from it. It's like I had written at the beginning of my previous answer:

Wouldn't it be nice if all validations were done before the card numbers
went into the database (or data files).

EDIT: Based on your latest comments under the earlier answer:

I whipped up a small demo:

// Place this code into a method or event somewhere...
String inputString = "my phone number is +54 123 344-4567 and CC 2222 4053 4324 8877 bla bla bla";
System.out.println("Input:  " + inputString);
System.out.println();

final java.util.List<String> numbers = new java.util.ArrayList<>();
    
final String regex = "(\\+?\\d+.{0,1}\\d+.{0,1}\\d+.{0,1}\\d+)|"       // Phone Numbers
                   + "(\\+{0,1}\\d+{0,3}\\s{0,1}\\-{0,1}\\({0,1}\\d+"  // Credit Cards
                   + "\\){0,1}\\s{0,1}\\-{0,1}\\d+\\s{0,1}\\-{0,1}\\d+)";

final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex);
final java.util.regex.Matcher matcher = pattern.matcher(inputString); 
while (matcher.find()) { 
    numbers.add(matcher.group()); 
}
    
String outputString = inputString;
    
for (String str : numbers) {
    //System.out.println(str);  // Uncomment for testing.
    // Is substring a valid Phone Number?
    int len = str.replaceAll("\\D","").length();  // Crushed number length
    if (isValidPhoneNumber(str)) {
        outputString = outputString.replace(str, maskAllExceptLast(str, 3, "x"));
    }
    else if (isValidCreditCardNumber(str)) {
        outputString = outputString.replace(str, 
        maskAllExceptLast(str.replaceAll("\\D",""), 4, "*"));
    }
}

System.out.println("Output: " + outputString);

Support methods....

public static String maskAllExceptLast (String inputString, int exceptLast_N, String... maskCharacter) {
    if(inputString.length() < exceptLast_N){
        return inputString;
    }
    String mask = "*";  // Default mask character.
    if (maskCharacter.length > 0) {
        mask = maskCharacter[0];
    }
    return inputString.replaceAll(".(?=.{" + exceptLast_N + "})", mask);
}

/**
 * Method to validate a supplied phone number. Currently validates phone
 * numbers supplied in the following fashion:
 * <pre>
 *
 *      Phone number 1234567890 validation result: true
 *      Phone number 123-456-7890 validation result: true
 *      Phone number 123-456-7890 x1234 validation result: true
 *      Phone number 123-456-7890 ext1234 validation result: true
 *      Phone number (123)-456-7890 validation result: true
 *      Phone number 123.456.7890 validation result: true
 *      Phone number 123 456 7890 validation result: true
 *      Phone number 01 123 456 7890 validation result: true
 *      Phone number 1 123-456-7890 validation result: true
 *      Phone number 1-123-456-7890 validation result: true</pre>
 *
 * @param phoneNumber (String) The phone number to check.<br>
 *
 * @return (boolean) True is returned if the supplied phone number is valid.
 *         False if it isn't.
 */
public static boolean isValidPhoneNumber(String phoneNumber) {
    boolean isValid = false;
    long len = phoneNumber.replaceAll("\\D","").length(); // Crush the phone Number into only digits
    // Check phone Number's length range. Must be from 8 to 12 digits long
    if (len < 8 || len > 12) {
        return false;
    }
    // Validate phone numbers of format "xxxxxxxx to xxxxxxxxxxxx"
    else if (phoneNumber.matches("\\d+")) {
        isValid = true;
    }
    //validating phone number with -, . or spaces
    else if (phoneNumber.matches("^(\\+\\d{1,3}( )?)?((\\(\\d{1,3}\\))|\\d{1,3})[- .]?\\d{3,4}[- .]?\\d{4}$")) {
        isValid = true;
    }
    /* Validating phone number with -, . or spaces and long distance prefix.
       This regex also ensures:
          - The actual number (withoug LD prefix) should be 10 digits only.
          - For North American, numbers with area code may be surrounded 
              with parentheses ().
          - The country code can be 1 to 3 digits long. Optionally may be 
            preceded by a + sign.
          - There may be dashes, spaces, dots or no spaces between country 
            code, area code and the rest of the number.
          - A valid phone number cannot be all zeros.                 */
    else if (phoneNumber.matches("^(?!\\b(0)\\1+\\b)(\\+?\\d{1,3}[. -]?)?"
                               + "\\(?\\d{3}\\)?([. -]?)\\d{3}\\3\\d{4}$")) {
        isValid = true;
    }
    //validating phone number with extension length from 3 to 5
    else if (phoneNumber.matches("\\d{3}-\\d{3}-\\d{4}\\s(x|(ext))\\d{3,5}")) {
        isValid = true;
    } 
    //validating phone number where area code is in braces ()
    else if (phoneNumber.matches("^(\\(\\d{1,3}\\)|\\d{1,3})[- .]?\\d{2,4}[- .]?\\d{4}$")) {
        isValid = true;
    } 
    //return false if nothing matches the input
    else {
        isValid = false;
    }
    return isValid;
}

/**
 * Returns true if card (ie: MasterCard, Visa, etc) number is valid using
 * the 'Luhn Algorithm'. First this method validates for a correct Card 
 * Network Number. The supported networks are:<pre>
 * 
 *    Number            Card Network
 *    ====================================
 *      2               Mastercard (BIN 2-Series) This is NEW!!
 *      30, 36, 38, 39  Diners-Club
 *      34, 37          American Express
 *      35              JBC
 *      4               Visa
 *      5               Mastercard
 *      6               Discovery</pre><br>
 * 
 * Next, the overall Credit Card number is checked with the 'Luhn Algorithm' 
 * for validity.<br>
 *
 * @param cardNumber (String)
 *
 * @return (Boolean) True if valid, false if not.
 */
public static boolean isValidCreditCardNumber(String cardNumber) {
    if (cardNumber == null || cardNumber.trim().isEmpty()) {
        return false;
    }
    // Strip card number of all non-digit characters.
    cardNumber = cardNumber.replaceAll("\\D", "");
    
    long len = cardNumber.length();
    if (len < 14 || len > 16) {   // Only going to 16 digits here 
        return false;
    }
        
    // Validate Card Network
    String[] cardNetworks = {"2", "30", "34", "35", "36", "37", "38", "39", "4", "5", "6"};
    String cardNetNum = cardNumber.substring(0, (cardNumber.startsWith("3") ? 2 : 1));
    boolean pass = false;
    for (String netNum : cardNetworks) {
        if (netNum.equals(cardNetNum)) {
            pass = true;
            break;
        }
    }
    if (!pass) {
        return false;  // Invalid Card Network
    }

    // Validate card number with the 'Luhn algorithm'.
    int nDigits = cardNumber.length();

    int nSum = 0;
    boolean isSecond = false;
    for (int i = nDigits - 1; i >= 0; i--) {
        int d = cardNumber.charAt(i) - '0';
        if (isSecond == true) {
            d = d * 2;
        }
        nSum += d / 10;
        nSum += d % 10;
        isSecond = !isSecond;
    }
    return (nSum % 10 == 0);
}

The code above will by no means be fast!

Tweak the regex or code to suit your specific needs.

Solution 2:[2]

Wouldn't it be nice if all validations were done before the card numbers went into the database (or data files).

I'm not convinced that using RegEx for any part of your code is necessarily the best course to take if what you want is speed since processing regular expressions can consume a lot of time. As an example, take the line that does the string masking in the maskWithoutLast4Digits() method:

static String maskWithoutLast4Digits(String input, String replacement) {
    if(input.length() <= 4){
        return input;    // There is nothing to mask!
    }
    return input.replaceAll(".(?=.{4})", replacement);
}
    

and replace it with this code:

static String maskWithoutLast4Digits(String input, String replacement) {
    if (input.length() <= 4) {
        return input; // There is nothing to mask!
    }
    char[] chars = input.toCharArray();
    Arrays.fill(chars, 0, chars.length - 4, replacement);
    return new String(chars);
}

You would probably find that the overall code will carry out the task on a single credit card number string almost twice as fast than the method with the regex. That's a considerable difference. As a matter of fact, if you run the code through a profiler you may very well find that the method with the regex in it could get progressively slower for each string processed whereas the second method will keep things flowing on a more constant speed.

Different credit cards basically start with a specific single numerical numerical value with the exception of a few cards, for example, if a credit card number begins with 3, then it's always part of the American Express, Diner's Club or Carte Blanche payment networks. If the card begins with a 4, then it is a Visa. Card numbers that begin with 5 are part of the MasterCards, while cards that begin with 6 belong to the Discover network.

  Card                   Starts With                   No. of Digits
  ==================================================================
  American Express       can be 34 or usually 37       15
  JBC                    35                            16
  Diners Club            usually 36 or can be 38       14
  VISA                   4                             16
  Mastercard             5                             16
  Discovery              6                             16

You don't need regex to determine if a credit card number starts with any of these values and it should be noted that some cards don't necessarily always contain the same number if digits. It may depend upon the issuer as I'm sure you already know but never the less, credit cards that are part of the Visa, Mastercard and Discover payment networks have 16 digits, while those that are part of the American Express payment network have just 15. While it's most common for credit cards to have 16 digits, they can possibly have as few as 13 and as many as 19. I haven't scoured your RegEx's but I'm sure they have that covered (right?).

To remove the use of Regex you could use a switch/case mechanism instead, for example:

// Demo card number...
    String cardNumber = "371449635398431";
    
/* Remove all Characters other than digits. 
   Don't want them for validation.      */
cardNumber = cardNumber.replaceAll("\\D", ""); // Remove all Characters other than digits
String cardName;  // Used to store the card's name 
switch (cardNumber.substring(0, 1)) {
    case "3":
        String typeNum = cardNumber.substring(0, 2);
        switch(typeNum) {
            case "34": case "37":
               cardName = "American-Express";
               break;
            case "35":
               cardName = "JBC";
               break;        
            case "30": case "36": case "38": case "39":
                cardName = "Diners-Club";
                break;
            default: 
                cardName = "UNKNOWN";
        }
        break;
    case "4":
        cardName = "Visa";
        break;
    case "5":
        cardName= "Mastercard";
        break;
    case "6":
        cardName = "Discovery";
        break;
    default:
        cardName = "UNKNOWN";
}

If you were to run speed tests on this code in comparison to iterating through a bunch of RegEx's, I believe you will find a considerable speed improvement even if you wanted to also check the length of each card number processed within each case.

The best way to validate a credit card number is to use the Luhn Formula (also known as the Luhn Algorithm) which basically follows this scheme:

  1. Begin by doubling the value of every odd digit of the card number you are verifying. If the resulting sum of any given doubling operation is greater than 9 (for example, 7 x 2 = 14 or 9 x 2 = 18), then add the digits of that sum (e.g., 14: 1 + 4 = 5 or 18: 1 + 8 = 9).
  2. Now add up all the resulting digits, including the even digits, which you did not multiply by two.
  3. If the total you received ends in 0, the card number is valid according to the Luhn Algorithm; otherwise it is not valid.

The whole process of course can be placed into a method for ease if use, for example:

/**
 * Returns true if card (ie: MasterCard, Visa, etc) number is valid using
 * the 'Luhn Algorithm'.
 *
 * @param cardNumber (String)
 *
 * @return (Boolean)
 */
public static boolean isValidCardNumber(String cardNumber) {
    if (cardNumber == null || cardNumber.trim().isEmpty()) {
        return false;
    }
    cardNumber = cardNumber.replaceAll("\\D", "");
    
    // Luhn algorithm
    int nDigits = cardNumber.length();

    int nSum = 0;
    boolean isSecond = false;
    for (int i = nDigits - 1; i >= 0; i--) {
        int d = cardNumber.charAt(i) - '0';
        if (isSecond == true) {
            d = d * 2;
        }
        // We add two digits to handle 
        // cases that make two digits  
        // after doubling 
        nSum += d / 10;
        nSum += d % 10;
        isSecond = !isSecond;
    }
    return (nSum % 10 == 0);
}

To put this all together your code might look something similar to this:

public static String validateCreditCardNumber(String cardNumber) {
    // Remove all Characters other than digits
    cardNumber = cardNumber.replaceAll("\\D", ""); // Remove all Characters other than digits
    String cardName;  // Used to store the card's name 
    switch (cardNumber.substring(0, 1)) {
        case "3":
            String typeNum = cardNumber.substring(0, 2);
            switch(typeNum) {
                case "34": case "37":
                   cardName = "American-Express";
                   break;
                case "35":
                   cardName = "JBC";
                   break;        
                case "30": case "36": case "38": case "39":
                    cardName = "Diners-Club";
                    break;
                default: 
                    cardName = "UNKNOWN";
            }
            break;
        case "4":
            cardName = "Visa";
            break;
        case "5":
            cardName= "Mastercard";
            break;
        case "6":
            cardName = "Discovery";
            break;
        default:
            cardName = "UNKNOWN";
    }
    
    if (!cardName.equals("UNKNOWN") && isValidCardNumber(cardNumber)) {
        return ("The " + cardName + " card number (" + maskWithoutLast4Digits(cardNumber, '*') + ") is VALID!");
    }
    else {
        return ("The " + cardName + " card number (" +  maskWithoutLast4Digits(cardNumber, '*') + ") is NOT VALID!");
    }
}

public static String maskWithoutLast4Digits (String input, char replacement) {
    if (input.length() <= 4) {
        return input; // Nothing to mask
    }
    char[] buf = input.toCharArray();
    Arrays.fill(buf, 0, buf.length - 4, replacement);
    return new String(buf);
}

/**
 * Returns true if card (ie: MasterCard, Visa, etc) number is valid using
 * the 'Luhn Algorithm'.
 *
 * @param cardNumber (String)
 *
 * @return (Boolean)
 */
public static boolean isValidCardNumber(String cardNumber) {
    if (cardNumber == null || cardNumber.trim().isEmpty()) {
        return false;
    }
    cardNumber = cardNumber.replaceAll("\\D", "");
    
    // Luhn algorithm
    int nDigits = cardNumber.length();

    int nSum = 0;
    boolean isSecond = false;
    for (int i = nDigits - 1; i >= 0; i--) {
        int d = cardNumber.charAt(i) - '0';
        if (isSecond == true) {
            d = d * 2;
        }
        // We add two digits to handle 
        // cases that make two digits  
        // after doubling 
        nSum += d / 10;
        nSum += d % 10;
        isSecond = !isSecond;
    }
    return (nSum % 10 == 0);
}

And to basically use the above:

// Demo card number...
String cardNumber = "371449635398431";
    
String isItValid = validateCreditCardNumber(cardNumber);
System.out.println(isItValid);

Out put to console would be:

The American-Express card number (***********8431) is VALID!

I'm not exactly sure where your output is going but it may be best to file it somewhere before displaying it since you will always be speed limited to that process. Also, Breaking the data into manageable chunks and using multiple executor-Service threads to process the data would greatly increase speed as can using one of the newer JDK's (above Java8) and utilizing some of the newer methods.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 DevilsHnd