'Why is 0x82 smaller than 0x80?

I'm trying to optimise a String class for a college class exercise. Normal strings get stored as char* and a size_t for length. sizeof(String) is 8 and it should stay like this. Yet if I only have strings with 7 or less chars (or 6 if you consider the null terminator), instead of using pointers I want to store them in the pointer/size_t bytes directly.

For this I have two structs, one for the char* and size_t and one with an array of 8 chars (bytes). Both I place in an union and give the String class a member of said union.

To determine if a string is a normal string, or a short one I use the most significant bit of the length size_t or byte[7]. If byte[7] is bigger or equal 128 (or 0x80) it's a short string and the characters are stored in the bytes directly. The length is then stored in the remaining bits of byte[7].

That's the theory so far. The normal string bit is already implemented and I'm now trying to implement the short string bit. The problem I have right now is with the following bit of code:

inline const char* c_str(void) const
    {
        if (compound.bytes.bytes[7] >= 0x80)
            return compound.bytes.bytes;
        return compound.string.m_string;
    }

From the Visual Studio watcher I know that compound.bytes.bytes[7] is 0x82 (the string is "hi"). So it should be 0x82 >= 0x80 as in true and return the bytes, yet for some reason this comparison get false and returns the char* of a normal string, which is of course a bogus pointer (0xcc006968 to be precise).

Also worth pointing out is that this very code still works correctly for normal strings.

What am I missing, what am I doing wrong?



Solution 1:[1]

The value 0x80 is a negative number when using signed 8-bit integers. Thus 0x82 would be less because it is also negative. The value 0x82 translates to -126 and 0x80 translates to -128, which means the 0x80 is less than 0x82 in twos complement signed integers.

Switch your data type to uint8_t.

Solution 2:[2]

char* is signed 0x80 and beyond are negative

Solution 3:[3]

You are working with signed values (char). So, 0x80 means -128, and 0x82 means -126. This code may work as you need:

inline const char* c_str(void) const
{
    if (static_cast<unsigned char>(compound.bytes.bytes[7]) >= 0x80u)
        return compound.bytes.bytes;
    return compound.string.m_string;
}

Solution 4:[4]

You are comparing signed char (0x82 = -126) to signed int (0x00000080 = 128). Signed char can be a number from -128 to 127, so it will be always less than 128.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 boatcoder
Solution 3
Solution 4 mareko