'ReplaceAll method in String API

I have a condition where I have to replace some character(special, non-print-able and other special character) from string as mention below

 private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
    private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
    private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";

stringValue.replaceAll(NON_ASCII_CHARACTERS, "").replaceAll(ASCII_CONTROL_CHARACTERS, "")
                .replaceAll(NON_PRINTABLE_CHARACTERS, "");
            

can we refactor above code means we can use single "replaceAll" method and put all conditions inside?

is there any way please advice.



Solution 1:[1]

Code point

You might consider an alternate avenue, other than using regex. You can use the code point integer number for each character, and query Character class for the category of character.

String input = … ;
String output = 
    input
    .codePoints()  // Returns an `IntStream` of code point `int` values.
    .filter( codePoint -> ! Character.isISOControl( codePoint ) )  // Filter for the characters you want to keep. Those code points flunking the `Predicate` test will be omitted. 
    .filter( codePoint -> codePoint < 127 ) ;  // Within US-ASCII range. Code point 127 is US-ASCII but is DEL, so we filter that out here. 
    .collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append )  // Convert the `int` code point integers back into characters. 
    .toString() ;  // Make a `String` from the contents of the `StringBuilder`. 

The Character class has many of the classifications defined by the Unicode Consortium. You can use them to narrow down the stream of code points to those which represent your desired characters.

Solution 2:[2]

According to the Pattern javadocs, it should also be possible to combine the three character class patterns into a single character class:

private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";

becomes

private static final String COMBINED =
  "[[^\\x00-\\x7F][\\p{Cntrl}&&[^\r\n\t]]\\p{C}]";

or

private static final String COMBINED =
    "[" + NON_ASCII_CHARACTERS + ASCII_CONTROL_CHARACTERS 
        + NON_PRINTABLE_CHARACTERS + "]";

Note that && (intersection) has lower precedence than the implicit union operator so all of the [ and ] meta-characters in the above are required.

You decide which version you think is clearer. It is a matter of opinion.

Solution 3:[3]

How do I do that?

In the way you just described, having multiple sbt modules.

I have to publish these modules separately?

You usually have a single root module that aggregates all the other modules like core and extras (which would depend on core) and you publish root which will transitively publish the others.

Can they have different versions?

AFAIK there is nothing preventing you to have different versions... but, I have never seen a library doing this, when the versions must not be the same they are in separate github repos (which makes sense, since at the end you will probably have an automated process that publishes all of them at once)

Also, it's not clear to me if library user should only include "feature module" which would transitively include core module.

If you follow the previous schema, the POM of extras will mention core thus users of your library can only import extras and their build tools will transitively fetch core
However, whenever or not they want to do that (or that they consider that a best / bad practice) rather than also including core explicitly is up to them. This topic is somewhat controversial, although most people agree that if you explicitly use something from a library then you must explicitly depend on it; see: https://github.com/cb372/sbt-explicit-dependencies

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Stephen C
Solution 3