About Memory Alignment

dharmaraj.guru's Avatar author of About Memory Alignment
This is an article on About Memory Alignment in C.
Many of us know that both C and C++ do padding when allocating memory for structure.
But only few know that why the complier does it. In short, for the efficient access of the memory for OS, padding is required.

In long words, the instruction can be fetched/written one machine word at a time. This machine word differs depends on h/w. In a typical 32-bit machines, in one memory attempt, 4 bytes can be either read or written. Thus for the efficient access of memory, our compilers does padding by barring the unused memory.

Let's take an example of a structure with and without having padding.
Code: C
struct x
{
    char a;
    int b;
}x;
Without padding :
********************
If we count the size of structure x, it would become 5 bytes only. If the allocation goes as it is (5 bytes), then there is no issue in reading the member x.a. It can be read in single attempt. But for the second member x.b, OS has to do two reads, then calculate the result, return to the user. It holds true and even difficult when the case comes to write. Also, if members are increased, and not aligned properly, OS itself takes time in reading/writing a single member. Consider the following structure.
Code: C
struct y
{
    char a;
    int b;
    char c;
    char d;
    char e;
    int f;
} y;
For the above structure, memory attempt is even much more complicated for any member, thus resulting OS spending more time in accessing these members only.

With Padding:
*****************
Lets take the same structure x again:
Though, we are allocating 3 more bytes than the actual size, each member can be accessed in single attempt. If we consider the second case also, any member can be accessed in single attempt.

One more point to note here is, if we list down the members in ascending order based on the size, we can utilize the memory in efficient manner. For example,
Code: C
struct A
{
    char x;
    char y;
    int z;
}A;
occupies only 8 bytes, where as,
Code: C
struct A
{
    char x;
    int z;
    char y;
}A;
occupies 12 bytes.

Hence, it is always to have the lower size members at first which precede the bigger size members when writing the structures.

Please correct me if I am wrong anywhere and do provide more info about this topic.

||| Dharma |||
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
The Article is selected for Article of the month for October 2007
Kailash's Avatar, Join Date: Jul 2007
Go4Expert Member
i got the concept of with padding
but can't get the conept of without padding.
how in structure x OS read x.a in one attempt and x.b in two attempt.
please clarify me.
dharmaraj.guru's Avatar
Go4Expert Member
Lets take the same structure x.

x needs 5 bytes (1 char + 1 integer) for its storage. Lets say x is stored from the address 1000 to 1004. As we know on a 32 bit machine, all memory word addressses are 4 multiples only ie, word addresses can be 1000, 1004,1008,1012....,2000,.... only. It cannot be either 1001 or 1002. Hence, when we access x.a(char), attempt on address 1000 is made by OS. Infact, OS actually reads 4 bytes(1000 – 1003) and extracts the first byte for the user. Thus, the char variable is accessed in single attempt. Consider the access of integer. It is stored in the region of 1001 to 1004. OS performs read on address 1000 first, extracts 3 bytes from it, then read address 1004, extracts one byte from it, do the necessary actions to formulate the integer, and finally return to user. Thus integer variable is accessed by 2 memory attempts alongwith few overhead operations for formulating the integer.

Am I answered your question correctly?

||| Dharma |||
asadullah.ansari's Avatar, Join Date: Jan 2008
TechCake
It's simple!!!!!

Code:
   struct stud
   {
           char ch;
           int    i;
           char ch1;
    };
Total size of this structure will be 12 byte on 32-bit machine. Because first character's size will be only 1 Byte but due to memory alignment( Algorithm to fast access memory by CPU) , 3 byte will as structure Hole. Because next data is integer which size is 4 Byte.

To avoid this user have to write structure very carefully.

Code:
Struct stud
{
   char ch;
   char ch1;
   int  i;
} ;
Now It's size will be 8 byte. First two character will come to on Four Byte cycle where actually size of these two character is 2 byte. But it is better that 1st structure.
This things happen only due to easily and fast access memory by CPU.

Last edited by shabbir; 9Jan2008 at 14:04.. Reason: Code block
msdnguide's Avatar, Join Date: May 2011
Go4Expert Member
memory alignment is very imp when we port code from 32 bit to 64 bit systems. code that work perfect on 32 bit systems may cause Bus errors in 64 bit system due to misaligned memory
gngrwzrd's Avatar, Join Date: Feb 2012
Newbie Member
Does using the __attribute__((packed)) on a structure cause the same effect for memory access? AKA: causing the CPU to have to do bit operations to get the right data?

I'm curious about the condition in which using __attribute__((packed)) is correct?
dearvivekkumar's Avatar, Join Date: Feb 2012
Go4Expert Member
Hi,

I was looking for the answer of this answer quite long. Thanks the explanation in the comment was really awesome. Thanks.

Please help me correct my understanding.

The padding is done, so that OS has not to perform multiple read operation. (But it's desirable to perform rest of computation for segregating the unwanted read bits)
For example
Code:
struct A
{
    char ch;       // let memory address be 1000, 1001
    char ch1;     // 1002, 1003
    int i;             // 1004, 1005, 1006, 1007
};

struct A a;
char c1 = a.ch;     // (1)
char c2 = a.ch1;   // (2)
int ii = a.i;             // (3)
For perform the the (1), (2) and (3) steps the OS performs three read operations. But does extraction(remove 3 extra bytes) for getting the one byte while it perform the step(1) since it's read 4 bytes and same for step(2) but for step(3) it does single read operation and there was no need to any addition computation in extraction. And may be the extraction is not that much expensive for the OS, that why it's assigned 2-2 bytes for ch and ch1, instead of giving them 4-4 bytes.

Please correct me if my understanding/observation lack somethings.

Thanks.
dearvivekkumar's Avatar, Join Date: Feb 2012
Go4Expert Member
Can any one explain this please?

"Multi-byte data must usually be aligned on a natural boundary."