Available System Calls
Before going into the technical details of system call implementation by the kernel (and by the userspace library), it is useful to take a brief look at the actual functions made available by the kernel in the form of system calls.
Each system call is identified by means of a symbolic constant whose platform-dependent definition is specified in <asm-arcfr/unistd.h>. Since not all system calls are supported on all architectures (some combinations are meaningless), the number of available calls varies from platform to platform — roughly speaking, there are always upward of 200 calls. As a result of changes to the kernel implementation of system calls over time, some calls are now superfluous, and their numbers are no longer used — the SPARC port (on 32-bit processors) boasts a large number of extinct calls that give rise to ''gaps'' in the list of calls.
It is simpler for programmers to group system calls into functional categories as they are not interested in their individual numbers — they are concerned only with the symbolic names and the meaning of the calls. The following short list — which makes no claim to be complete — gives an overview of the various categories and their most important system calls.
Process Management Processes are at the center of the system, so it's not surprising that a large number of system calls are devoted to process management. The functions provided by the calls range from querying simple information to starting new processes:
□ fork and vfork split an existing process into two new processes as described in Chapter 2. clone is an enhanced version of fork that supports, among other things, the generation of threads.
□ exit ends a process and frees its resources.
□ A whole host of system calls exist to query (and set) process properties such as PID, UID, and so on.; most of these calls simply read or modify a field in the task structure. The following can be read: PID, GID, PPID, SID, UID, EUID, PGID, EGID, and PGRP. The following can be set: UID, GID, REUID, REGID, SID, SUID, and FSGID.
System calls are named in accordance with a logical scheme that uses designations such as setgid, setuid, and geteuid.
□ personality defines the execution environment of an application and is used, for instance, in the implementation of binary emulations.
□ ptrace enables system call tracing and is the platform on which the above strace tool builds.
□ nice sets the priority of normal processes by assigning a number between -20 and 19 in descending order of importance. Only root processes (or processes with the cap_sys_nice permission) are allowed to specify negative values.
□ setrlimit is used to set certain resource limits, for example, CPU time or the maximum permitted number of child processes. getrlimit queries the current limits (i.e., maximum permitted values), and getrusage queries current resource usage to check whether the process is still within the defined resource limits.
Time Operations Time operations are critical, not only to query and set the current system time, but also to give processes the opportunity to perform time-based operations, as described in Chapter 15:
□ adjtimex reads and sets time-based kernel variables to control kernel time behavior.
□ alarm and setitimer set up alarms and interval timers to defer actions to a later time. getitimer reads settings.
□ gettimeofday and settimeofday get and set the current system time, respectively. Unlike times, they also take account of the current time zone and daylight saving time.
□ sleep and nanosleep suspend process execution for a defined interval; nanosleep defines high-precision intervals.
□ time returns the number of seconds since midnight on January 1, 1970 (this date is the classic time base for Unix systems). stime sets this value and therefore changes the current system date.
Signal Handling Signals are the simplest (and oldest) way of exchanging limited information between processes and of facilitating interprocess communication. Linux supports not only classic signals common to all Unix look-alikes but also real-time signals in line with the POSIX standard. Chapter 5 deals with the implementation of the signal mechanism.
□ signal installs signal handler functions. sigaction is a modern, enhanced version that supports additional options and provides greater flexibility.
□ sigpending checks whether signals are pending for the process but are currently blocked.
□ sigsuspend places the process on the wait queue until a specific signal (from a set of signals) arrives.
□ setmask enables signal blocking, while getmask returns a list of all currently blocked signals.
□ kill is used to send any signals to a process.
□ The same system calls are available to handle real-time signals. However, their function names are prefixed with rt_. For example, rt_sigaction installs a real-time signal handler, and rt_sigsuspend puts the process in a wait state until a specific signal (from a set of signals) arrives.
In contrast to classic signals, 64 different real-time signals can be handled on all architectures — even on 32-bit CPUs. Additional information can be associated with real-time signals, and this makes the work of (application) programmers a little easier.
Scheduling Scheduling-related system calls could be grouped into the process management category because all such calls logically relate to system tasks. However, they merit a category of their own due simply to the sheer number of manipulation options provided by Linux to parameterize process behavior.
□ setpriority and getpriority set and get the priority of a process and are therefore key system calls for scheduling purposes.
□ Linux is noted not only for supporting different process priorities, but also for providing a wide variety of scheduling classes to suit the specific time behavior and time requirements of applications. sched_setscheduler and sched_getscheduler set and query scheduling classes. sched_setparam and sched_getparam set and query additional scheduling parameters of processes (currently, only the parameter for real-time priority is used).
□ sched_yield voluntarily relinquishes control even when CPU time is still available to the process.
Modules System calls are also used to add and remove modules to and from the kernel, as described in Chapter 7.
□ init_module adds a new module.
□ delete_module removes a module from the kernel.
Filesystem All system calls relating to the filesystem apply to the routines of the VFS layer discussed in Chapter 8. From there, the individual calls are forwarded to the filesystem implementations that usually access the block layer. System calls of this kind are very costly in terms of resources and execution time.
□ Some system calls are used as a direct basis for userspace utilities of the same name that create and modify the directory structure: chdir, mkdir, rmdir, rename, symlink, getcwd, chroot, umask, and mknod.
□ File and directory attributes can be modified using chown and chmod.
□ The following utilities for processing file contents are implemented in the standard library and have the same names as the system calls: open, close, read and readv, write and writev, truncate and llseek.
□ readdir and getdents read directory structures.
□ link, symlink, and unlink create and delete links (or files if they are the last element in a hard link); readlink reads the contents of a link.
□ mount and umount are used to attach and detach filesystems.
□ poll and select are used to wait for some event.
□ execve loads a new process in place of an old process. It starts new programs when used in conjunction with fork.
Memory Management Under normal circumstances, user applications rarely or never come into contact with memory management system calls because this area is completely shielded from the standard library — by the malloc, balloc, and calloc functions in the case of C. Implementation is usually programming language-specific because each language has different dynamic memory management needs and often provides features like garbage collection that require sophisticated allocation of the memory available to the kernel.
□ In terms of dynamic memory management, the most important call is brk, which modifies the size of the process data segment. Programs that invoke malloc or similar functions (almost all nontrivial code) make frequent use of this system call.
□ mmap, mmap2, munmap, and mremap perform mapping, unmapping, and remapping operations, while mprotect and madvise control access to and give advice about specific regions of virtual memory.
mmap and mmap2 differ slightly by their parameters; refer to the manual pages for more details. The GNU C library uses mmap2 by default; mmap is just a userland wrapper function by now.
Depending on the malloc implementation, it can also be that mmap or mmap2 is used internally. This works because anonymous mappings allow installing mappings that are not backed by a file. This approach allows for achieving more flexibility than by using brk.
□ swapon and swapoff enable and disable (additional) swap space on external storage devices.
Interprocess Communication and Network Functions Because ''IPC and networks'' are complex issues, it would be easy to assume that a rich selection of system calls is available. As Chapters 12 and 5 show, however, the opposite is true. Only two system calls are provided to handle all possible tasks. However, a very large number of parameters is involved. The C standard library spreads them over many different functions with just a few parameters so that they are easier for programmers to handle. Ultimately, the functions are always based on the two system calls:
□ socketcall deals with network questions and is used to implement socket abstraction. It manages connections and protocols of all kinds and implements a total of 17 different functions differentiated by means of constants such as sys_accept, sys_sendto, and so on. The arguments themselves must be passed as a pointer that, depending on function type, points to a userspace structure holding the required data.
□ ipc is the counterpart to socketcall and is used for process connections local to the computer and not for connections established via networks. Because this system call need implement ''only'' 11 different functions, it uses a fixed number of arguments — five in all — to transfer data from userspace to kernel space.
System Information and Settings It is often necessary to query information on the running kernel and its configuration and on the system configuration. Similarly, kernel parameters need to be set and information must be saved to system log files. The kernel provides three further system calls to perform such tasks:
□ syslog writes messages to the system logs and permits the assignment of different priorities (depending on message priority, userspace tools send the messages either to a permanent log file or directly to the console to inform users of critical situations).
□ sysinfo returns information on the state of the system, particularly statistics on memory usage (RAM, buffer, swap space).
□ sysctl is used to ''fine-tune'' kernel parameters. The kernel now supports an immense number of dynamically configurable options that can be read and modified using the proc filesystem, as described in Chapter 10.
System Security and Capabilities The traditional Unix security model — based on users, groups, and an ''omnipotent'' root user — is not flexible enough for modern needs. This has led to the introduction of the capabilities system, which enables non-root processes to be furnished with additional privileges and capabilities according to a fine-grained scheme.
In addition, the Linux security modules subsystem (LSM) provides a general interface to support modules whose functions are invoked at various hooks in the kernel to perform security checks:
□ capset and capget are responsible for setting and querying process capabilities.
□ security is a system call multiplexer for implementing LSM.
Continue reading here: Implementation of System Calls
Was this article helpful?