'[guid]::NewGuid().GetBytes() returns different result than [System.Text.Encoding]::UTF8.GetBytes(...)
I found this excellent approach on shortening GUIDs here on stackowerflow: .NET Short Unique Identifier
I have some other strings that I wanted to treat the same way, but I found out that in most cases the Base64String is even longer than the original string.
My question is: why does [guid]::NewGuid().ToByteArray() return a significant smaller byte array than [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)?
For example, let's look at the following GUID:
$guid = [guid]::NewGuid()
$guid
Guid
----
34c2b21e-18c3-46e7-bc76-966ae6aa06bc
With $guid.GetBytes(), the following is returned:
30
178
194
52
195
24
231
70
188
118
150
106
230
170
6
188
And [System.Convert]::ToBase64String($guid.ToByteArray()) generates HrLCNMMY50a8dpZq5qoGvA==
[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($guid.Guid)), however, returns MzRjMmIyMWUtMThjMy00NmU3LWJjNzYtOTY2YWU2YWEwNmJj, with [System.Text.Encoding]::UTF8.GetBytes($guid.Guid) being:
51
52
99
50
98
50
49
101
45
49
56
99
51
45
52
54
101
55
45
98
99
55
54
45
57
54
54
97
101
54
97
97
48
54
98
99
Solution 1:[1]
The GUID struct is an object storing a 16 byte array that contains its value.
These are the 16 bytes you see when you perform its method .ToByteArray() method.
The 'normal' string representation is a grouped series of these bytes in hexadecimal format. (4-2-2-2-6)
As for converting to Base64, this will always return a longer string because each Base64 digit represents exactly 6 bits of data.
Therefore, every three 8-bits bytes of the input (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
The resulting string can even be extended with = padding characters at the end of the string to always be a multiple of 4.
The result is a string of [math]::Ceiling(<original size> / 3) * 4 length.
Using [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid) is actually first performing the GUID's .ToString() method and from that string it will return the ascii values of each character in there.
(hexadecimal representation = 2 characters per byte = 32 values + the four dashes in it leaves a 36-byte array)
Solution 2:[2]
[guid]::NewGuid().ToByteArray()
In the scope of this question, a GUID can be seen as a 128-bit number (actually it is a structure, but that's not relevant to the question). When converting it into a byte array, you divide 128 by 8 (bits per byte) and get an array of 16 bytes.
[System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)
This converts the GUID to a hexadecimal string representation first. Then this string gets encoded as UTF-8.
A hex string uses two characters per input byte (one hex digit for the lower and one for the upper 4 bits). So we need at least 32 characters (16 bytes of GUID multiplied by 2). When converted to UTF-8 each character relates to exactly one byte, because all hex digits as well as the dash are in the basic ASCII range which maps 1:1 to UTF-8. So including the dashes we end up with 32 + 4 = 36 bytes.
So this is what [System.Convert]::ToBase64String() has to work with - 16 bytes of input in the first case and 36 bytes in the second case.
Each Base64 output digit represents up to 6 input bits.
16 input bytes = 128 bits, divided by 6 = 22 Base64 characters
36 input bytes = 288 bits, divided by 6 = 48 Base64 characters
That's how you end up with more than twice the number of Base64 characters when converting a GUID to hex string first.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
