'Java regex to mask a value in a string having a fixed pattern

I have different types of logs having a common pattern. What I intend to do is mask the value present in it with #.

cvc-length-valid: Value '9899488103' with length = '10' is not facet-valid with respect to length '20' for type 'customerId'.

cvc-pattern-valid: Value 'GB200102BUYFNBUYSN' is not facet-valid with respect to pattern '[A-Za-z]{2,2}(17|18|19|20|21)[0-9]{2}((0)[1-9]|(1)[012])((0)[1-9]|(1|2)[0-9]|(3)[01])[A-Za-z]{1}[A-Za-z#]{4}[A-Za-z]{1}[A-Za-z#]{4}' for type '#AnonType_NATIONAL_ID_CONCATbuyerSellerId'.

Expected Output

cvc-length-valid: Value '#####' with length = '10' is not facet-valid with respect to length '20' for type 'customerId'.
cvc-pattern-valid: Value '#####' is not facet-valid with respect to pattern '[A-Za-z]{2,2}(17|18|19|20|21)[0-9]{2}((0)[1-9]|(1)[012])((0)[1-9]|(1|2)[0-9]|(3)[01])[A-Za-z]{1}[A-Za-z#]{4}[A-Za-z]{1}[A-Za-z#]{4}' for type '#AnonType_NATIONAL_ID_CONCATbuyerSellerId'.


Solution 1:[1]

You could try a regex replacement:

String input = "cvc-length-valid: Value '9899488103' with length = '10' is not facet-valid with respect to length '20' for type 'customerId'.";
String output = input.replaceAll("(\\S+): Value '.*?' (.*)", "$1: Value '#####' $2");
System.out.println(output);

This prints:

cvc-length-valid: Value '#####' with length = '10' is not facet-valid with respect to length '20' for type 'customerId'.

Solution 2:[2]

One practice is to use LayoutWrappingEncoder and configuration class to read the masking patterns from the configuration and apply them in the log messages. This is rather a simple approach and can be achieved with a custom pattern handler.

The MaskingPatternLayout (Configuration) class to read patterns

public class MaskingPatternLayout extends PatternLayout {

    private Pattern multilinePattern;
    private List<String> maskPatterns = new ArrayList<>();

    public void addMaskPattern(String maskPattern) {
        maskPatterns.add(maskPattern);
        multilinePattern = Pattern.compile(maskPatterns.stream().collect(Collectors.joining("|")), Pattern.MULTILINE);
    }

    @Override
    public String doLayout(ILoggingEvent event) {
        return maskMessage(super.doLayout(event));
    }

    private String maskMessage(String message) {
        if (multilinePattern == null) {
            return message;
        }
        StringBuilder sb = new StringBuilder(message);
        Matcher matcher = multilinePattern.matcher(sb);
        while (matcher.find()) {
            IntStream.rangeClosed(1, matcher.groupCount()).forEach(group -> {
                if (matcher.group(group) != null) {
                    IntStream.range(matcher.start(group), matcher.end(group)).forEach(i -> sb.setCharAt(i, '#'));
                }
            });
        }
        return sb.toString();
    }
}

And pattern matcher(here i assumed we using Logback as log provider) as logback.xml file .

<configuration>
    <appender name="mask" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
           <layout class="path.to.MaskingPatternLayout">
           <maskPattern>\"YOU mentioned log pattern\"</maskPattern>
         
            </layout>
        </encoder>
    </appender>
</ configuration>

Solution 3:[3]

public static final String XML_VALIDATION_VALUE_REGEX = "(?<= Value ')(.*?)(?=')";
public static final String XML_VALIDATION_MASK_VALUE = "#####";


String text = "cvc-length-valid: Value '9899488111' with length = '10' is not facet-valid with respect to length '20' for type 'customerId'.";
System.out.println(text.replaceAll(XML_VALIDATION_VALUE_REGEX, XML_VALIDATION_MASK_VALUE));

The explanation for the regex

Positive Lookbehind (?<= Value ') - matches the characters Value ' literally (case sensitive)

1st Capturing Group (.*?) - . matches any character (except for line terminators) *? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)

Positive Lookahead (?=') - ' matches the character ' with index 3910 (2716 or 478) literally (case sensitive)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Biegeleisen
Solution 2 Lunatic
Solution 3 truekiller