'Reduce Length of a string without losing data
I have a string of length 13 composed of alphabets(uppercase + lowercase both are possible but currently using only UpperCase) and Integers (0-9)(ex: BWOOL0JDXUNP1) and I wanted to reduce its length to 6-10 characters without losing any data.
I tried this by converting it into bytes using StandardCharsets.UTF_8 then new BigInteger(1, bytes).toString(36) but it increases the length to 18 characters.
I am not sure if it is possible or not. If there is any way to do this in Java please help.
Solution 1:[1]
Assuming that the string represents a number in base 36 [0-9A-Z], it may be "compressed" additionally by converting to base 62 [0-9A-Za-z], however, this does not help much in reducing the size, only 1 symbol is "saved":
String str = "BWOOL0JDXUNP1";
BigInteger bi = new BigInteger(str, 36);
String alpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
BigInteger size = BigInteger.valueOf(alpha.length());
System.out.println(str);
StringBuilder sb = new StringBuilder();
while (bi.compareTo(BigInteger.ZERO) > 0) {
int cp = bi.mod(size).intValue();
sb.append(alpha.charAt(cp));
bi = bi.divide(size);
}
System.out.println(sb);
// -> BWOOL0JDXUNP1
// -> NXew7nv28E51
Additionally, binary compression may be applied with some custom encoding, for instance, the mentioned 62 characters [0-9A-Za-z] fit into just 6 bits, so the mentioned 12 characters may be represented with just 12 * 0.75 = 9 bytes. Of course, in this case additional "unpacking" method would be needed to properly represent the result.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
