Memory Barriers

Last Updated on Sun, 25 Dec 2022 | Linux Kernel Reference

When using optimizing compilers, you should never take for granted that instructions will be performed in the exact order in which they appear in the source code. For example, a compiler might reorder the assembly language instructions in such a way to optimize how registers are used. Moreover, modern CPUs usually execute several instructions in parallel, and might reorder memory accesses. These kinds of reordering can greatly speed up the program.

When dealing with synchronization, however, instructions reordering must be avoided. As a matter of fact, all synchronization primitives act as memory barriers. Things would quickly become hairy if an instruction placed after a synchronization primitive is executed before the synchronization primitive itself.

A memory barrier primitive ensures that the operations placed before the primitive are finished before starting the operations placed after the primitive. Thus, a memory barrier is like a firewall that cannot be passed by any assembly language instruction.

In the 80 x 86 processors, the following kinds of assembly language instructions are said to be "serializing" because they act as memory barriers:

• All instructions that operate on I/O ports

• All instructions prefixed by the lock byte (see Section 5.3.1)

• All instructions that write into control registers, system registers, or debug registers (for instance, cli and sti, which change the status of the IF flag in the eflags register)

• A few special assembly language instructions; among them, the iret instruction that terminates an interrupt or exception handler

Linux uses six memory barrier primitives, which are shown in Table 5-4. "Read memory barriers" act only on instructions that read from memory, while "write memory barriers" act only on instructions that write to memory. Memory barriers can be useful both in multiprocessor systems and in uniprocessor systems. The smp_xxx( ) primitives are used whenever the memory barrier should prevent race conditions that might occur only in multiprocessor systems; in uniprocessor systems, they do nothing. The other memory barriers are used to prevent race conditions occurring both in uniprocessor and

Table 5-4. Memory barriers in Linux
Macro	Description
mb( )	Memory barrier for MP and UP
rmb( )	Read memory barrier for MP and UP
wmb( )	Write memory barrier for MP and UP
smp mb( )	Memory barrier for MP only
smp rmb( )	Read memory barrier for MP only
smp wmb( )	Write memory barrier for MP only

The implementations of the memory barrier primitives depend on the architecture of the system. For instance, on the Intel platform, the rmb( ) macro expands into asm volatile("lock;addl $0,0(%%esp)":::"memory") . The asm instruction tells the compiler to insert some assembly language instructions. The volatile keyword forbids the compiler to reshuffle the asm instruction with the other instructions of the program. The memory keyword forces the compiler to assume that all memory locations in RAM have been changed by the assembly language instruction; therefore, the compiler cannot optimize the code by using the values of memory locations stored in CPU registers before the asm instruction. Finally, the lock;addl $0,0(%%esp) assembly language instruction adds zero to the memory location on top of the stack; the instruction is useless by itself, but the lock prefix makes the instruction a memory barrier for the CPU. The wmb( ) macro on Intel is actually simpler because it expands into asm volatile("":::"memory"). This is because

Intel processors never reorder write memory accesses, so there is no need to insert a serializing assembly language instruction in the code. The macro, however, forbids the compiler to reshuffle the instructions.

Notice that in multiprocessor systems, all atomic operations described in the earlier section Section 5.3.1 act as memory barriers because they use the lock byte.

Continue reading here: Spin Locks

Was this article helpful?

Readers' Questions

ursula
Is cmpxchg16b instruction act as a memory barrier?

1 year ago
- Reply

No, the cmpxchg16b instruction does not act as a memory barrier. It is an atomic instruction that can be used to perform compare-and-swap operations on 16-byte values, but it does not guarantee that any memory accesses that occur before or after it will happen in a particular order. For this purpose, a memory barrier instruction should be used.

Louis Mackay
How to debug memory barriers?

1 year ago
- Reply

Use profiling tools to analyze memory usage and performance. These tools can help identify any threads that are not properly accessing data or that are being blocked due to a memory barrier.

Visualize memory usage with a memory debugger. This can help identify where memory barriers are occurring, and what order they are executed in.

Check for any race conditions in your code. If two threads are accessing the same memory, but doing so out of sync, memory barriers will not be properly enforced.

Identify any synchronization issues. Ensure that all threads are properly synchronized before and after memory barriers, to avoid unexpected behavior.

Review the code to ensure that all memory barriers are properly implemented. Check for any potential issues such as incorrect usages of locks or improper accesses to shared memory.

Memory Barriers

Related Posts

Readers' Questions