'Reduce Length of a string without losing data

I have a string of length 13 composed of alphabets(uppercase + lowercase both are possible but currently using only UpperCase) and Integers (0-9)(ex: BWOOL0JDXUNP1) and I wanted to reduce its length to 6-10 characters without losing any data. I tried this by converting it into bytes using StandardCharsets.UTF_8 then new BigInteger(1, bytes).toString(36) but it increases the length to 18 characters. I am not sure if it is possible or not. If there is any way to do this in Java please help.



Solution 1:[1]

Assuming that the string represents a number in base 36 [0-9A-Z], it may be "compressed" additionally by converting to base 62 [0-9A-Za-z], however, this does not help much in reducing the size, only 1 symbol is "saved":

String str = "BWOOL0JDXUNP1";
BigInteger bi = new BigInteger(str, 36);

String alpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
BigInteger size = BigInteger.valueOf(alpha.length());
System.out.println(str);

StringBuilder sb = new StringBuilder();
while (bi.compareTo(BigInteger.ZERO) > 0) {
    int cp = bi.mod(size).intValue();
    sb.append(alpha.charAt(cp));
    bi = bi.divide(size);
}
System.out.println(sb);
// -> BWOOL0JDXUNP1
// -> NXew7nv28E51

Additionally, binary compression may be applied with some custom encoding, for instance, the mentioned 62 characters [0-9A-Za-z] fit into just 6 bits, so the mentioned 12 characters may be represented with just 12 * 0.75 = 9 bytes. Of course, in this case additional "unpacking" method would be needed to properly represent the result.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1