'Create and fill a 10 bits set from two 8 bits characters

We have 2 characters a and b of 8 bits that we want to encode in a 10 bits set. What we want to do is take the first 8 bits of character a put them in the first 8 bits of the 10 bits set. Then take only the first 2 bits of character b and fill the rest.

enter image description here

QUESTION: Do I need to shift the 8 bits in order to concatenate the other 2 ?

// Online C++ compiler to run C++ program online
#include <iostream>
#include <bitset>

struct uint10_t {
    uint16_t value : 10;
    uint16_t _     : 6;
};

uint10_t hash(char a, char b){
    uint10_t hashed;
    // Concatenate 2 bits to the other 8
    hashed.value = (a << 8) + (b & 11000000);
    return hashed;
}

int main() {
   uint10_t hashed = hash('a', 'b');
   std::bitset<10> newVal = hashed.value;
   std::cout << newVal << "  "<<hashed .value << std::endl;
   return 0;
}

Thanks @Scheff's Cat. My cat says Hi enter image description here

c++


Solution 1:[1]

Do I need to shift the 8 bits in order to concatenate the other 2?

Yes.

The bits of a have to be shifted left to make room for the two bits of b. As there is room needed for two bits a left shift by 2 is appropriate. (Before my recent update, there was a wrong left shift by 8 which I didn't notice. Shame on me.)

The bits of b have to be shifted right. The reason is that OP wants to combine the two most significant bits of b with them of a. As these two bits have to appear as least significant bits in the result they have to be shifted to that position.

It should be:

hashed.value = (a << 2) + ((b & 0xc0) >> 6);

or

hashed.value = (a << 2) + ((b & 0b11000000) >> 6);

As b is of type char (which is signed or unsigned depending on the compiler), it is even better to swap the order of & and >>:

hashed.value = (a << 2) + ((b >> 6) & 0x03);

or

hashed.value = (a << 2) + ((b >> 6) & 0b11);

This ensures that any possible sign bit extension is eliminated which may occur if the type char is a signed type in the specific compiler and b has a negative value (i.e. the most significant bit is set and will be replicated in the conversion to int).

MCVE on coliru:

#include <iostream>
#include <bitset>

struct uint10_t {
    uint16_t value : 10;
    uint16_t _     : 6;
};

uint10_t hash(char a, char b){
    uint10_t hashed;
    // Concatenate 2 bits to the other 8
    hashed.value = (a << 2) + ((b >> 6) & 0b11);
    return hashed;
}

int main() {
    uint10_t hashed = hash('a', 'b');
    std::cout << "a: " << std::bitset<8>('a') << '\n';
    std::cout << "b:         " << std::bitset<8>('b') << '\n';
    std::bitset<10> newVal = hashed.value;
    std::cout << "   " << newVal << "  " << hashed.value << std::endl;
}

Output:

a: 01100001
b:         01100010
   0110000101  389

One may wonder why the two upper bits of a are not lost although a is of type char which is usually an 8 bit type. The reason is that integral arithmetic operations work at least on int types. Hence, a << 2 involves the implicit conversion of a to int which has at least 16 bit.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1