Concept

Traditionally, many resources are managed globally in Linux as well as other Unix derivatives. For instance, all processes in the system are conventionally identified by their PID, which implies that a global list of PIDs must be managed by the kernel. Likewise, the information about the system returned by the uname system call (which includes the system name and some information about the kernel) is the same for all callers. User IDs are managed in a similar fashion: Each user is identified by a UID number that is globally unique.

3In Section 2.4.1, you will see that Linux does use the copy-on-write mechanism to not copy memory pages of the forked process until the new process performs a write access to the pages — this is more efficient than blindly copying all memory pages immediately on execution of fork. The link between the memory pages of the parent and child process needed to do this is visible to the kernel only and is transparent to the applications.

Global identifiers allow the kernel to selectively grant or deny certain privileges. While the root user with UID 0 is essentially allowed to do anything, higher user IDs are more confined. A user with PID n may, for instance, not kill processes that belong to user m = n. However, this does not prevent users from seeing each other: User n can see that another user m is also active on the machine. This is no problem: As long as users can only fiddle with their own processes, there is no reason why they should not be allowed to observe that other users have processes as well.

There are cases, though, where this can be undesired. Consider that a web provider wants to give full access to Linux machines to customers, including root access. Traditionally, this would require setting up one machine per customer, which is a costly business. Using virtualized environments as provided by KVM or VMWare is one way to solve the problem, but does not distribute resources very well: One separate kernel is required for each customer on the machine, and also one complete installation of the surrounding userland.

A different solution that is less demanding on resources is provided by namespaces. Instead of using virtualized systems such that one physical machine can run multiple kernels — which may well be from different operating systems — in parallel, a single kernel operates on a physical machine, and all previously global resources are abstracted in namespaces. This allows for putting a group of processes into a container, and one container is separated from other containers. The separation can be such that members of one container have no connection whatsoever with other containers. Is is, however, also possible to loosen the separation of containers by allowing them to share certain aspects of their life. For instance, containers could be set up to use their own set of PIDs, but still share portions of filesystems with each other.

Namespaces essentially create different views of the system. Every formerly global resource must be wrapped up in a container data structure, and only tuples of the resource and the containing namespace are globally unique. While the resource alone is enough inside a given container, it does not provide a unique identity outside the container. An overview of the situation is given in Figure 2-3.

Consider a case in which three different namespaces are present on the system. Namespaces can be hierarchically related, and I consider this case here. One namespace is the parent namespace, which has spawned two child namespaces. Assume that the containers are used in a hosting setup where each container must look like a single Linux machine. Each of them therefore has its own init task with PID 0, and the PIDs of other tasks are assigned in increasing order. Both child namespaces have an init task with PID 0, and two processes with PIDs 2 and 3, respectively. Since PIDs with identical values appear multiple times on the system, the numbers are not globally unique.

While none of the child containers has any notion about other containers in the system, the parent is well informed about the children, and consequently sees all processes they execute. They are mapped to the PID range 4 to 9 in the parent process. Although there are 9 processes on the system, 15 PIDs are required to represent them because one process can be associated with more than one PID. The ''right'' one depends on the context in which the process is observed.

Namespaces can also be non-hierarchical if they wrap simpler quantities, for instance, like the UTS namespace discussed below. In this case, there is no connection between parent and child namespaces.

Notice that support for namespaces in a simple form has been available in Linux for quite a long time in the form of the chroot system call. This method allows for restricting processes to a certain part of

Child namespaces

Figure 2-3: Namespaces can be related in a hierarchical order. Each namespace has a parent from which it originates, and a parent can have multiple children.

Child namespaces

Figure 2-3: Namespaces can be related in a hierarchical order. Each namespace has a parent from which it originates, and a parent can have multiple children.

the filesystem and is thus a simple namespace mechanism. True namespaces do, however, allow for controlling much more than just the view on the filesystem.

New namespaces can be established in two ways:

1. When a new process is created with the fork or clone system call, specific options control if namespaces will be shared with the parent process, or if new namespaces are created.

2. The unshare system call dissociates parts of a process from the parent, and this also includes namespaces. See the manual page unshare(2) for more information.

Once a process has been disconnected from the parent namespace using any of the two mechanisms above, changing a — from its point of view — global property will not propagate into the parent namespace, and neither will a change on the parent side propagate into the child, at least for simple quantities. The situation is more involved for filesystems where the sharing mechanisms are very powerful and allow a plethora of possibilities, as discussed in Chapter 8.

Namespaces are currently still marked as experimental in the standard kernel, and development to make all parts of the kernel fully namespace-aware are still going on. As of kernel 2.6.24, the basic framework is, however, set up and in place.4 The file Documentation/namespaces/compatibility-list. txt provides information about some problems that are still present in the current state of the implementation.

4This, however, does not imply that the approach was only recently developed. In fact, the methods have been used in production systems over many years, but were only available as external kernel patches.

Continue reading here: Implementation

Was this article helpful?

0 0