'What is data alignment? Why and when should I be worried when typecasting pointers in C? [duplicate]

I couldn't find a decent document that explains how the alignment system works and why some types are more strictly aligned than the others.



Solution 1:[1]

This is "implementation defined", i.e. the alignment requirements are not part of the language specification.

Different CPUs have different requirements on alignment. Some could not address a 16bit value on an uneven address, some could. Some could not address a floating point value unless aligned to an address divisible by its size, some could. And so on. Some would access misaligned data objects more slowly than properly aligned ones, others would trip over an unaligned access.

That is why the language standard does not go into the details of which type needs to be aligned which way (because it couldn't), but left it to the "implementation" -- the compiler backend, in this case.

If you typecast pointers, you might be forcing the code to address a given object at an address where it cannot be addressed. You need to ensure that the alignment requirements of the "old" type are at least as strict as those of the "new" type.

In C++ (C++11 upwards), you get the alignof operator to tell you the alignment requirements of a given type. You also get the alignas operator to enforce a more strict alignment on a given type or object.

In C (C11 upwards), you get the _Alignof and _Alignas operators, which <stdalign.h> wraps into the alignof / alignas convenience macros. (Thanks, Lundin -- C11 is not my forte.)

Solution 2:[2]

Some systems can access memory in portions of, say, 32-bit words (4 bytes). It's a hardware limitation. It means that the actual address going into the memory controller should be divisible by four (as it is still addressing the bytes). So once you try to have a word located at address which is not divisible by four, there ate two options - either the compiler will try to generate some fancy code to compose the word out of two memory accesses, but it is not always the case. Sometimes it will just generate a code to access 4 bytes out of the given address. And then the processor will fail with the data alignment error.

Which leads to limitation the language is imposing.

Consider the code (a bad one):

uint8_t a[] = {1,2,3,4,5,6};
uint32_t b = *(uint32_t*)&a[1];

and assume a is aligned to the divisible by four boundary. Then the second line is trying to read a word out of an address of it's second element, i.e. an address not divisible by four. It will lead to the alignment error. But in C it is simply forbidden by the strict aliasing rule.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2