IPC Shared Memory

The most useful IPC mechanism is shared memory, which allows two or more processes to access some common data structures by placing them in an IPC shared memory region. Each process that wants to access the data structures included in an IPC shared memory region must add to its address space a new memory region (see Section 8.3), which maps the page frames associated with the IPC shared memory region. Such page frames can then be easily handled by the kernel through demand paging (see Section 8.4.3).

As with semaphores and message queues, the shmget( ) function is invoked to get the IPC identifier of a shared memory region, optionally creating it if it does not already exist.

The shmat( ) function is invoked to "attach" an IPC shared memory region to a process. It receives as its parameter the identifier of the IPC shared memory resource and tries to add a shared memory region to the address space of the calling process. The calling process can require a specific starting linear address for the memory region, but the address is usually unimportant, and each process accessing the shared memory region can use a different address in its own address space. The process's Page Tables are left unchanged by shmat( ). We describe later what the kernel does when the process tries to access a page that belongs to the new memory region.

The shmdt( ) function is invoked to "detach" an IPC shared memory region specified by its IPC identifier—that is, to remove the corresponding memory region from the process's address space. Recall that an IPC shared memory resource is persistent: even if no process is using it, the corresponding pages cannot be discarded, although they can be swapped out.

As for the other types of IPC resources, in order to avoid overuse of memory by User Mode processes, there are some limits on the allowed number of IPC shared memory regions (by default, 4,096), on the size of each segment (by default, 32 megabytes), and on the maximum total size of all segments (by default, 8 gigabytes). As usual, however, the system administrator can tune these values by writing into the /proc/sys/kernel/shmmni, /proc/sys/kernel/shmmax, and /proc/sys/kernel/shmall files, respectively.

Figure 19-3. IPC shared memory data structures

Figure 19-3. IPC shared memory data structures

The data structures associated with IPC shared memory regions are shown in Figure 19-3. The shm_ids variable stores the ipc_ids data structure of the IPC shared memory resource type; its entries field is an array of pointers to shmid_kernel data structures, one item for every IPC shared memory resource. Formally, the array stores pointers to kern_ipc_perm data structures, but each such structure is simply the first field of the msg_queue data structure. All fields of the shmid_kernel data structure are shown in Table 19-13.

Table 19-13. The fields in the shmid kernel data structure




struct kern ipc perm

shm perm

kern ipc perm data structure

struct file *

shm file

Special file of the segment



Slot index of the segment

unsigned long

shm nattch

Number of current attaches

unsigned long

shm segsz

Segment size in bytes


shm atime

Last access time


shm dtime

Last detach time


shm ctime

Last change time


shm cprid

PID of creator


shm lprid

PID of last accessing process

The most important field is shm_file, which stores the address of a file object. This reflects the tight integration of IPC shared memory with the VFS layer in Linux 2.4. In particular, each IPC shared memory region is associated with a regular file belonging to the shm special filesystem (see Section 12.3.1).

Since the shm filesystem has no mount point in the system directory tree, no user can open and access its files by means of regular VFS system calls. However, whenever a process "attaches" a segment, the kernel invokes do_mmap( ) and creates a new shared memory mapping of the file in the address space of the process. Therefore, files that belong to the shm special filesystem have just one file object method, mmap, which is implemented by the shm mmap( ) function.

As shown in Figure 19-3, a memory region that corresponds to an IPC shared memory region is described by a vm_area_struct object (see Section 15.2); its vm_file field points back to the file object of the special file, which in turn references a dentry object and an inode object. The inode number, stored in the i_ino field of the inode, is actually the slot index of the IPC shared memory region, so the inode object indirectly references the shmid_kernel descriptor.

As usual, for any shared memory mapping, page frames that belong to the IPC shared memory region are included in the page cache through an address_space object referenced by the i_mapping field of the inode (you might also refer to Figure 15-4). Swapping out pages of IPC shared memory regions

The kernel has to be careful when swapping out pages included in shared memory regions, and the role of the swap cache is crucial (this topic was already discussed in Section 16.3).

As explained in Section 16.5.1, to swap out a page owned by an address_space object, the kernel essentially marks the page as dirty, thus triggering a data transfer to disk, and then removes the page from the process's Page Table. If the page belongs to a shared file memory mapping, eventually the page is no longer referenced by any process, and the shrink_cache( ) function will release it to the Buddy system (see Section 16.7.5). This is fine because the data in the page is just a duplicate of some data on disk.

However, pages of an IPC shared memory region map a special inode that has no image on disk. Moreover, an IPC shared memory region is persistent—that is, its pages must be preserved even when the segment is not attached to any process. Therefore, the kernel cannot simply discard the pages when reclaiming the corresponding page frames; rather, the pages have to be swapped out.

The try_to_swap_out( ) function includes no check for this special case, so a page belonging to the region is marked as dirty and removed from the process address space. Even the shrink_cache( ) function, which periodically prunes the page cache from the least recently used pages, has no check for this special case, so it ends up invoking the writepage method of the owner address_space object (see Section 16.7.5).

How, then, are IPC shared memory regions preserved when their pages are swapped out? Pages belonging to IPC shared memory regions implement the writepage method by means of a custom shmem_writepage( ) function, which essentially allocates a new page slot in a swap area and moves the page from the page cache to the swap cache (it's just a matter of changing the owner address_space object of the page). The function also stores the swapped-out page identifier in a shmem_inode_info structure embedded in the filesystem-specific portion of the inode object. Notice that the page is not immediately written onto the swap area: this is done when the shrink_cache( ) is invoked again. Demand paging for IPC shared memory regions

The pages added to a process by shmat( ) are dummy pages; the function adds a new memory region into a process's address space, but it doesn't modify the process's Page Tables. Moreover, as we have seen, pages of an IPC shared memory region can be swapped

As we know, a Page Fault occurs when a process tries to access a location of an IPC shared memory region whose underlying page frame has not been assigned. The corresponding exception handler determines that the faulty address is inside the process address space and that the corresponding Page Table entry is null; therefore, it invokes the do_no_page( ) function (see Section 8.4.3). In turn, this function checks whether the nopage method for the memory region is defined. That method is invoked, and the Page Table entry is set to the address returned from it (see also Section 15.2.4).

Memory regions used for IPC shared memory always define the nopage method. It is implemented by the shmem_nopage( ) function, which performs the following operations:

1. Walks the chain of pointers in the VFS objects and derives the address of the inode object of the IPC shared memory resource (see Figure 19-3).

2. Computes the logical page number inside the segment from the vm_start field of the memory region descriptor and the requested address.

3. Checks whether the page is already included in the swap cache (see Section 16.3); if so, it terminates by returning its address.

4. Checks whether the shmem_inode_info embedded in the inode object stores a swapped-out page identifier for the logical page number. If so, it performs a swap-in operation by invoking swapin_readahead( ) (see Section 16.6.1), waits until the data transfer completes, and terminates by returning the address of the page.

5. Otherwise, the page is not stored in a swap area; therefore, the function allocates a new page from the Buddy system, inserts into the page cache, and returns its address.

The do_no_page( ) function sets the entry that corresponds to the faulty address in the process's Page Table so that it points to the page frame returned by the method.

I [email protected] RuBoard

+1 0

Post a comment