[10 Synchronization problems have been fully described in other works we refer the interested reader to books on the Unix operating systems see the bibliography
These problems occur not only among kernel control paths, but also among processes sharing common data. Several synchronization techniques have been adopted. The following section concentrates on how to synchronize kernel control paths.
220.127.116.11 Nonpreemptive kernels
In search of a drastically simple solution to synchronization problems, most traditional Unix kernels are nonpreemptive: when a process executes in Kernel Mode, it cannot be arbitrarily suspended and substituted with another process. Therefore, on a uniprocessor system, all kernel data structures that are not updated by interrupts or exception handlers are safe for the kernel to access.
Of course, a process in Kernel Mode can voluntarily relinquish the CPU, but in this case, it must ensure that all data structures are left in a consistent state. Moreover, when it resumes its execution, it must recheck the value of any previously accessed data structures that could be changed.
Nonpreemptability is ineffective in multiprocessor systems, since two kernel control paths running on different CPUs can concurrently access the same data structure.
18.104.22.168 Interrupt disabling
Another synchronization mechanism for uniprocessor systems consists of disabling all hardware interrupts before entering a critical region and reenabling them right after leaving it. This mechanism, while simple, is far from optimal. If the critical region is large, interrupts can remain disabled for a relatively long time, potentially causing all hardware activities to freeze.
Moreover, on a multiprocessor system, this mechanism doesn't work at all. There is no way to ensure that no other CPU can access the same data structures that are updated in the protected critical region.
A widely used mechanism, effective in both uniprocessor and multiprocessor systems, relies on the use of semaphores. A semaphore is simply a counter associated with a data structure; it is checked by all kernel threads before they try to access the data structure. Each semaphore may be viewed as an object composed of:
• An integer variable
• A list of waiting processes
The down( ) method decrements the value of the semaphore. If the new value is less than 0, the method adds the running process to the semaphore list and then blocks (i.e., invokes the scheduler). The up( ) method increments the value of the semaphore and, if its new value is greater than or equal to 0, reactivates one or more processes in the semaphore list.
Each data structure to be protected has its own semaphore, which is initialized to 1. When a kernel control path wishes to access the data structure, it executes the down( ) method on the proper semaphore. If the value of the new semaphore isn't negative, access to the data structure is granted. Otherwise, the process that is executing the kernel control path is added to the semaphore list and blocked. When another process executes the up( ) method on that semaphore, one of the processes in the semaphore list is allowed to proceed.
22.214.171.124 Spin locks
In multiprocessor systems, semaphores are not always the best solution to the synchronization problems. Some kernel data structures should be protected from being concurrently accessed by kernel control paths that run on different CPUs. In this case, if the time required to update the data structure is short, a semaphore could be very inefficient. To check a semaphore, the kernel must insert a process in the semaphore list and then suspend it. Since both operations are relatively expensive, in the time it takes to complete them, the other kernel control path could have already released the semaphore.
In these cases, multiprocessor operating systems use spin locks. A spin lock is very similar to a semaphore, but it has no process list; when a process finds the lock closed by another process, it "spins" around repeatedly, executing a tight instruction loop until the lock becomes open.
Of course, spin locks are useless in a uniprocessor environment. When a kernel control path tries to access a locked data structure, it starts an endless loop. Therefore, the kernel control path that is updating the protected data structure would not have a chance to continue the execution and release the spin lock. The final result would be that the system hangs.
126.96.36.199 Avoiding deadlocks
Processes or kernel control paths that synchronize with other control paths may easily enter a deadlocked state. The simplest case of deadlock occurs when process p1 gains access to data structure a and process p2 gains access to b, but p1 then waits for b and p2 waits for a. Other more complex cyclic waits among groups of processes may also occur. Of course, a deadlock condition causes a complete freeze of the affected processes or kernel control paths.
As far as kernel design is concerned, deadlocks become an issue when the number of kernel semaphores used is high. In this case, it may be quite difficult to ensure that no deadlock state will ever be reached for all possible ways to interleave kernel control paths. Several operating systems, including Linux, avoid this problem by introducing a very limited number of semaphores and requesting semaphores in an ascending order.
1.6.6 Signals and Interprocess Communication
Unix signals provide a mechanism for notifying processes of system events. Each event has its own signal number, which is usually referred to by a symbolic constant such as sigterm. There are two kinds of system events:
For instance, a user can send the interrupt signal sigint to a foreground process by pressing the interrupt keycode (usually CTRL-C) at the terminal.
Synchronous errors or exceptions
For instance, the kernel sends the signal sigsegv to a process when it accesses a memory location at an illegal address.
The POSIX standard defines about 20 different signals, two of which are user-definable and may be used as a primitive mechanism for communication and synchronization among processes in User Mode. In general, a process may react to a signal delivery in two possible ways:
• Ignore the signal.
• Asynchronously execute a specified procedure (the signal handler).
If the process does not specify one of these alternatives, the kernel performs a default action that depends on the signal number. The five possible default actions are:
• Terminate the process.
• Write the execution context and the contents of the address space in a file (core dump) and terminate the process.
• Ignore the signal.
• Suspend the process.
• Resume the process's execution, if it was stopped.
Kernel signal handling is rather elaborate since the POSIX semantics allows processes to temporarily block signals. Moreover, the sigkill and sigstop signals cannot be directly handled by the process or ignored.
AT&T's Unix System V introduced other kinds of interprocess communication among processes in User Mode, which have been adopted by many Unix kernels: semaphores, message queues, and shared memory. They are collectively known as System VIPC.
The kernel implements these constructs as IPC resources. A process acquires a resource by invoking a shmget( ), semget( ), or msgget( ) system call. Just like files, IPC resources are persistent: they must be explicitly deallocated by the creator process, by the current owner, or by a superuser process.
Semaphores are similar to those described in Section 1.6.5, earlier in this chapter, except that they are reserved for processes in User Mode. Message queues allow processes to exchange messages by using the msgsnd( ) and msgget( ) system calls, which insert a message into a specific message queue and extract a message from it, respectively.
Shared memory provides the fastest way for processes to exchange and share data. A process starts by issuing a shmget( ) system call to create a new shared memory having a required size. After obtaining the IPC resource identifier, the process invokes the shmat( ) system call, which returns the starting address of the new region within the process address space. When the process wishes to detach the shared memory from its address space, it invokes the shmdt( ) system call. The implementation of shared memory depends on how the kernel implements process address spaces. 1.6.7 Process Management
Unix makes a neat distinction between the process and the program it is executing. To that end, the fork( ) and _exit( ) system calls are used respectively to create a new process and to terminate it, while an exec( ) -like system call is invoked to load a new program. After such a system call is executed, the process resumes execution with a brand new address space containing the loaded program.
The process that invokes a fork( ) is the parent, while the new process is its child. Parents and children can find one another because the data structure describing each process includes a pointer to its immediate parent and pointers to all its immediate children.
A naive implementation of the fork( ) would require both the parent's data and the parent's code to be duplicated and assign the copies to the child. This would be quite time consuming. Current kernels that can rely on hardware paging units follow the Copy-On-Write approach, which defers page duplication until the last moment (i.e., until the parent or the child is required to write into a page). We shall describe how Linux implements this technique in Section 8.4.4.
The _exit( ) system call terminates a process. The kernel handles this system call by releasing the resources owned by the process and sending the parent process a sigchld signal, which is ignored by default.
188.8.131.52 Zombie processes
How can a parent process inquire about termination of its children? The wait( ) system call allows a process to wait until one of its children terminates; it returns the process ID (PID) of the terminated child.
When executing this system call, the kernel checks whether a child has already terminated. A special zombie process state is introduced to represent terminated processes: a process remains in that state until its parent process executes a wait( ) system call on it. The system call handler extracts data about resource usage from the process descriptor fields; the process descriptor may be released once the data is collected. If no child process has already terminated when the wait( ) system call is executed, the kernel usually puts the process in a wait state until a child terminates.
Many kernels also implement a waitpid( ) system call, which allows a process to wait for a specific child process. Other variants of wait( ) system calls are also quite common.
It's good practice for the kernel to keep around information on a child process until the parent issues its wait( ) call, but suppose the parent process terminates without issuing that call? The information takes up valuable memory slots that could be used to serve living processes. For example, many shells allow the user to start a command in the background and then log out. The process that is running the command shell terminates, but its children continue their execution.
The solution lies in a special system process called init, which is created during system initialization. When a process terminates, the kernel changes the appropriate process descriptor pointers of all the existing children of the terminated process to make them become children of init. This process monitors the execution of all its children and routinely issues wait( ) system calls, whose side effect is to get rid of all zombies.
184.108.40.206 Process groups and login sessions
Modern Unix operating systems introduce the notion of process groups to represent a "job" abstraction. For example, in order to execute the command line:
$ ls | sort | more a shell that supports process groups, such as bash, creates a new group for the three processes corresponding to ls, sort, and more. In this way, the shell acts on the three processes as if they were a single entity (the job, to be precise). Each process descriptor includes a process group ID field. Each group of processes may have a group leader, which is the process whose PID coincides with the process group ID. A newly created process is initially inserted into the process group of its parent.
Modern Unix kernels also introduce login sessions. Informally, a login session contains all processes that are descendants of the process that has started a working session on a specific terminal—usually, the first command shell process created for the user. All processes in a process group must be in the same login session. A login session may have several process groups active simultaneously; one of these process groups is always in the foreground, which means that it has access to the terminal. The other active process groups are in the background. When a background process tries to access the terminal, it receives a sigttin or sigttout signal. In many command shells, the internal commands bg and fg can be used to put a process group in either the background or the foreground.
1.6.8 Memory Management
Memory management is by far the most complex activity in a Unix kernel. More than a third of this book is dedicated just to describing how Linux does it. This section illustrates some of the main issues related to memory management.
220.127.116.11 Virtual memory
All recent Unix systems provide a useful abstraction called virtual memory. Virtual memory acts as a logical layer between the application memory requests and the hardware Memory Management Unit (MMU). Virtual memory has many purposes and advantages:
• Several processes can be executed concurrently.
• It is possible to run applications whose memory needs are larger than the available physical memory.
• Processes can execute a program whose code is only partially loaded in memory.
• Each process is allowed to access a subset of the available physical memory.
• Processes can share a single memory image of a library or program.
• Programs can be relocatable — that is, they can be placed anywhere in physical memory.
• Programmers can write machine-independent code, since they do not need to be concerned about physical memory organization.
The main ingredient of a virtual memory subsystem is the notion of virtual address space. The set of memory references that a process can use is different from physical memory addresses.
When a process uses a virtual addressj111 the kernel and the MMU cooperate to locate the actual physical location of the requested memory item.
Was this article helpful?