Memory Barriers

When using optimizing compilers, you should never take for granted that instructions will be performed in the exact order in which they appear in the source code. For example, a compiler might reorder the assembly language instructions in such a way to optimize how registers are used. Moreover, modern CPUs usually execute several instructions in parallel, and might reorder memory accesses. These kinds of reordering can greatly speed up the program.

When dealing with synchronization, however, instructions reordering must be avoided. As a matter of fact, all synchronization primitives act as memory barriers. Things would quickly become hairy if an instruction placed after a synchronization primitive is executed before the synchronization primitive itself.

A memory barrier primitive ensures that the operations placed before the primitive are finished before starting the operations placed after the primitive. Thus, a memory barrier is like a firewall that cannot be passed by any assembly language instruction.

In the 80 x 86 processors, the following kinds of assembly language instructions are said to be "serializing" because they act as memory barriers:

• All instructions that operate on I/O ports

• All instructions prefixed by the lock byte (see Section 5.3.1)

• All instructions that write into control registers, system registers, or debug registers (for instance, cli and sti, which change the status of the IF flag in the eflags register)

• A few special assembly language instructions; among them, the iret instruction that terminates an interrupt or exception handler

Linux uses six memory barrier primitives, which are shown in Table 5-4. "Read memory barriers" act only on instructions that read from memory, while "write memory barriers" act only on instructions that write to memory. Memory barriers can be useful both in multiprocessor systems and in uniprocessor systems. The smp_xxx( ) primitives are used whenever the memory barrier should prevent race conditions that might occur only in multiprocessor systems; in uniprocessor systems, they do nothing. The other memory barriers are used to prevent race conditions occurring both in uniprocessor and

Table 5-4. Memory barriers in Linux

Macro

Description

mb( )

Memory barrier for MP and UP

rmb( )

Read memory barrier for MP and UP

wmb( )

Write memory barrier for MP and UP

smp mb( )

Memory barrier for MP only

smp rmb( )

Read memory barrier for MP only

smp wmb( )

Write memory barrier for MP only

The implementations of the memory barrier primitives depend on the architecture of the system. For instance, on the Intel platform, the rmb( ) macro expands into asm volatile("lock;addl $0,0(%%esp)":::"memory") . The asm instruction tells the compiler to insert some assembly language instructions. The volatile keyword forbids the compiler to reshuffle the asm instruction with the other instructions of the program. The memory keyword forces the compiler to assume that all memory locations in RAM have been changed by the assembly language instruction; therefore, the compiler cannot optimize the code by using the values of memory locations stored in CPU registers before the asm instruction. Finally, the lock;addl $0,0(%%esp) assembly language instruction adds zero to the memory location on top of the stack; the instruction is useless by itself, but the lock prefix makes the instruction a memory barrier for the CPU. The wmb( ) macro on Intel is actually simpler because it expands into asm volatile("":::"memory"). This is because

Intel processors never reorder write memory accesses, so there is no need to insert a serializing assembly language instruction in the code. The macro, however, forbids the compiler to reshuffle the instructions.

Notice that in multiprocessor systems, all atomic operations described in the earlier section Section 5.3.1 act as memory barriers because they use the lock byte.

+1 0

Responses

  • Louis Mackay
    How to debug memory barriers?
    2 years ago

Post a comment