Linux employs a hierarchical scheme in which each process depends on a parent process. The kernel starts the init program as the first process that is responsible for further system initialization actions and display of the login prompt or (in more widespread use today) display of a graphical login interface. init is therefore the root from which all processes originate, more or less directly, as shown graphically by the pstree program. init is the top of a tree structure whose branches spread further and further down.
[email protected]> pstree init-+-acpid
|-bonobo-activati |-cron |-cupsd
|-2*[dbus-daemon] |-dbus-launch -dcopserver |-dhcpcd | -esd |-ethl |-events/0 |-gam_server |-gconfd-2
|-gnome-vfs-daemo |-gpg-agent |-hald-addon-acpi |-kaccess |-kded
|-evolution-alarm | |-kinternet
|-kio_file |-klauncher |-konqueror
|-kdesktop |-kgpg |-khelper |-kicker |-klogd |-kmix |-knotify |-kpowersave |-kscd |-ksmserver |-ksoftirqd/0 |-kswapd0 |-kthread-+-aio/0 |-ata/0 |-kacpid |-kblockd/0 |-kgameportd |-khubd
How this tree structure spreads is closely connected with how new processes are generated. For this purpose, Unix uses two mechanisms called fork and exec.
1. fork — Generates an exact copy of the current process that differs from the parent process only in its PID (process identification). After the system call has been executed, there are two processes in the system, both performing the same actions. The memory contents of the initial process are duplicated — at least in the view of the program. Linux uses a well-known technique known as copy on write that allows it to make the operation much more efficient by deferring the copy operations until either parent or child writes to a page — read-only accessed can be satisfied from the same page for both.
A possible scenario for using fork is, for example, when a user opens a second browser window. If the corresponding option is selected, the browser executes a fork to duplicate its code and then starts the appropriate actions to build a new window in the child process.
2. exec — Loads a new program into an existing content and then executes it. The memory pages reserved by the old program are flushed, and their contents are replaced with new data. The new program then starts executing.
Processes are not the only form of program execution supported by the kernel. In addition to heavy-weight processes — another name for classical Unix processes — there are also threads, sometimes referred to as light-weight processes. They have also been around for some time, and essentially, a process may consist of several threads that all share the same data and resources but take different paths through the program code. The thread concept is fully integrated into many modern languages — Java, for instance. In simple terms, a process can be seen as an executing program, whereas a thread is a program function or routine running in parallel to the main program. This is useful, for example, when Web browsers need to load several images in parallel. Usually, the browser would have to execute several fork and exec calls to generate parallel instances; these would then be responsible for loading the images and making data received available to the main program using some kind of communication mechanisms. Threads make this situation easier to handle. The browser defines a routine to load images, and the routine is started as a thread with multiple strands (each with different arguments). Because the threads and the main program share the same address space, data received automatically reside in the main program. There is therefore no need for any communication effort whatsoever, except to prevent the threads from stepping onto their feet mutually by accessing identical memory locations, for instance. Figure 1-2 illustrates the difference between a program with and without threads.
J Address Space ->- Control Flow
W/O Threads With Threads
Figure 1-2: Processes with and without threads.
Linux provides the clone method to generate threads. This works in a similar way to fork but enables a precise check to be made of which resources are shared with the parent process and which are generated independently for the thread. This fine-grained distribution of resources extends the classical thread concept and allows for a more or less continuous transition between thread and processes.
During the development of kernel 2.6, support for namespaces was integrated into numerous subsystems. This allows different processes to have different views of the system. Traditionally, Linux (and Unix in general) use numerous global quantities, for instance, process identifiers: Every process in the system is equipped with a unique identifier (ID), and this ID can be employed by users (or other processes) to refer to the process — by sending it a signal, for instance. With namespaces, formerly global resources are grouped differently: Every namespace can contain a specific set of PIDs, or can provide different views of the filesystem, where mounts in one namespace do not propagate into different namespaces.
Namespaces are useful; for example, they are beneficial for hosting providers: Instead of setting up one physical machine per customer, they can instead use containers implemented with namespaces to create multiple views of the system where each seems to be a complete Linux installation from within the container and does not interact with other containers: They are separated and segregated from each other. Every instance looks like a single machine running Linux, but in fact, many such instances can operate simultaneously on a physical machine. This helps use resources more effectively. In contrast to full virtualization solutions like KVM, only a single kernel needs to run on the machine and is responsible to manage all containers.
Not all parts of the kernel are yet fully aware of namespaces, and I will discuss to what extent support is available when we analyze the various subsystems.
Continue reading here: Address Spaces and Privilege Levels
Was this article helpful?