'[guid]::NewGuid().GetBytes() returns different result than [System.Text.Encoding]::UTF8.GetBytes(...)

I found this excellent approach on shortening GUIDs here on stackowerflow: .NET Short Unique Identifier

I have some other strings that I wanted to treat the same way, but I found out that in most cases the Base64String is even longer than the original string.

My question is: why does [guid]::NewGuid().ToByteArray() return a significant smaller byte array than [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)?

For example, let's look at the following GUID:

$guid = [guid]::NewGuid()
$guid

Guid
----
34c2b21e-18c3-46e7-bc76-966ae6aa06bc

With $guid.GetBytes(), the following is returned:

30
178
194
52
195
24
231
70
188
118
150
106
230
170
6
188

And [System.Convert]::ToBase64String($guid.ToByteArray()) generates HrLCNMMY50a8dpZq5qoGvA==

[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($guid.Guid)), however, returns MzRjMmIyMWUtMThjMy00NmU3LWJjNzYtOTY2YWU2YWEwNmJj, with [System.Text.Encoding]::UTF8.GetBytes($guid.Guid) being:

51
52
99
50
98
50
49
101
45
49
56
99
51
45
52
54
101
55
45
98
99
55
54
45
57
54
54
97
101
54
97
97
48
54
98
99


Solution 1:[1]

The GUID struct is an object storing a 16 byte array that contains its value.
These are the 16 bytes you see when you perform its method .ToByteArray() method.

The 'normal' string representation is a grouped series of these bytes in hexadecimal format. (4-2-2-2-6)

As for converting to Base64, this will always return a longer string because each Base64 digit represents exactly 6 bits of data.

Therefore, every three 8-bits bytes of the input (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
The resulting string can even be extended with = padding characters at the end of the string to always be a multiple of 4.
The result is a string of [math]::Ceiling(<original size> / 3) * 4 length.

Using [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid) is actually first performing the GUID's .ToString() method and from that string it will return the ascii values of each character in there.
(hexadecimal representation = 2 characters per byte = 32 values + the four dashes in it leaves a 36-byte array)

Solution 2:[2]

[guid]::NewGuid().ToByteArray()

In the scope of this question, a GUID can be seen as a 128-bit number (actually it is a structure, but that's not relevant to the question). When converting it into a byte array, you divide 128 by 8 (bits per byte) and get an array of 16 bytes.

[System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)

This converts the GUID to a hexadecimal string representation first. Then this string gets encoded as UTF-8.

A hex string uses two characters per input byte (one hex digit for the lower and one for the upper 4 bits). So we need at least 32 characters (16 bytes of GUID multiplied by 2). When converted to UTF-8 each character relates to exactly one byte, because all hex digits as well as the dash are in the basic ASCII range which maps 1:1 to UTF-8. So including the dashes we end up with 32 + 4 = 36 bytes.

So this is what [System.Convert]::ToBase64String() has to work with - 16 bytes of input in the first case and 36 bytes in the second case.

Each Base64 output digit represents up to 6 input bits.

  • 16 input bytes = 128 bits, divided by 6 = 22 Base64 characters

  • 36 input bytes = 288 bits, divided by 6 = 48 Base64 characters

That's how you end up with more than twice the number of Base64 characters when converting a GUID to hex string first.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2