System call

In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services (e.g. accessing the hard disk), creating and executing new processes, and communicating with integral kernel services (like scheduling). System calls provide the interface between a process and the operating system.

Contents

Privileges

The design of the microprocessor architecture on practically all modern systems (except some embedded systems) involves a security model (such as the rings model) which specifies multiple privilege levels under which software may be executed; for instance, a program is usually limited to its own address space so that it cannot access or modify other running programs or the operating system itself, and a program is usually prevented from directly manipulating hardware devices (e.g. the frame buffer or network devices).

However, many normal applications obviously need access to these components, so system calls are made available by the operating system to provide well-defined, safe implementations for such operations. The operating system executes at the highest level of privilege, and allows applications to request services via system calls, which are often executed via interrupts; an interrupt automatically puts the CPU into some required privilege level, and then passes control to the kernel, which determines whether the calling program should be granted the requested service. If the service is granted, the kernel executes a specific set of instructions over which the calling program has no direct control, returns the privilege level to that of the calling program, and then returns control to the calling program.

The library as an intermediary

Generally, systems provide a library or API that sits between normal programs and the operating system, usually an implementation of the C library (libc), such as glibc, that provides wrapper functions for the system calls, often named the same as the system calls that they call. The library's wrapper functions expose an ordinary function calling convention (a subroutine call on the assembly level) for using the system call, as well as making the use of the system call more modular. Here, the primary function of the wrapper is to place all the arguments to be passed to the system call, and also setting a unique system call number for the kernel to call. In this way the library, which exists between the OS and the application, increases portability.

It should be noted that the terms "system call" and "syscall" are often incorrectly used to refer to the aforementioned C standard library functions, particularly those that act as a wrapper to corresponding system calls with the same name. The call to the library function itself does not cause a switch to kernel mode (if the execution was not already in kernel mode) and is usually a normal subroutine call (i.e. using a "CALL" assembly instruction in some ISAs). The actual system call does transfer control to the kernel (and is more implementation-dependent and platform-dependent than the library call abstracting it). For example, in Unix-like systems, "fork" and "execve" are C library functions that in turn execute instructions that invoke the "fork" and "execve" system calls. Making the system call directly in the application code is more complicated and may require embedded assembly code to be used (in C and C++) as well as knowledge of the application binary interface; the library functions are meant to abstract this away.

On exokernel based systems, the library is especially important as an intermediary. On exokernels, libraries shield user applications from the very low level kernel API, and provide abstractions and resource management.

Examples and tools

On Unix, Unix-like and other POSIX-compatible operating systems, popular system calls are open, read, write, close, wait, execve, fork, exit, and kill. Many of today's operating systems have hundreds of system calls. For example, Linux has over 300 different calls, FreeBSD has over 500[1], while Plan 9 has 51.

Tools such as strace and truss allow a process to execute from start and report all system calls the process invokes, or can attach to an already running process and intercept any system call made by said process if the operation does not violate the permissions of the user. This special ability of the program is usually also implemented with a system call, e.g. strace is implemented with ptrace or system calls on files in procfs.

Typical implementations

Implementing system calls requires a control transfer which involves some sort of architecture-specific feature. A typical way to implement this is to use a software interrupt or trap. Interrupts transfer control to the operating system kernel so software simply needs to set up some register with the system call number needed, and execute the software interrupt.

For many RISC processors this is the only technique provided, but CISC architectures such as x86 support additional techniques. One example is SYSCALL/SYSENTER, SYSRET/SYSEXIT (the two mechanisms were independently created by AMD and Intel, respectively, but in essence do the same thing). These are "fast" control transfer instructions that are designed to quickly transfer control to the OS for a system call without the overhead of an interrupt. Linux 2.5 began using this on the x86, where available; formerly it used the INT instruction, where the system call number was placed in the EAX register before interrupt 0x80 was executed.[2]

An older x86 mechanism is called a call gate and is a way for a program to literally call a kernel function directly using a safe control transfer mechanism the OS sets up in advance. This approach has been unpopular, presumably due to the requirement of a far call which uses x86 memory segmentation and the resulting lack of portability it causes, and existence of the faster instructions mentioned above.

For IA-64 architecture, EPC (Enter Privileged Mode) instruction is used. The first eight system call arguments are passed in registers, and the rest are passed on the stack.

System calls can be roughly grouped into five major categories:

  1. Process Control.
  2. File management.
    • create file, delete file
    • open, close
    • read, write, reposition
    • get/set file attributes
  3. Device Management.
    • request device, release device
    • read, write, reposition
    • get/set device attributes
    • logically attach or detach devices
  4. Information Maintenance.
    • get/set time or date
    • get/set system data
    • get/set process, file, or device attributes
  5. Communication.
    • create, delete communication connection
    • send, receive messages
    • transfer status information
    • attach or detach remote devices

Processor mode and context switching

A syscall is processed in kernel mode, which is accomplished by changing the processor execution mode to a more privileged one, but no process context switch is necessary. The hardware sees the world in terms of the execution mode according to the processor status register, and processes are an abstraction provided by the operating system. A syscall does not require a context switch to another process, it is processed in the context of whichever process invoked it. [3] [4]

References

  1. ^ "FreeBSD syscalls.c, the list of syscall names and IDs". http://fxr.watson.org/fxr/source/kern/syscalls.c. 
  2. ^ Anonymous (2002-12-19). "Linux 2.5 gets vsyscalls, sysenter support". KernelTrap. http://kerneltrap.org/node/531. Retrieved 2008-01-01. 
  3. ^ Bach, Maurice J. (1986), The Design of the UNIX Operating System, Prentice Hall, pp. 15-16.
  4. ^ "Discussion of syscall implementation at ProgClub including quote from Bach 1986". http://www.progclub.org/pipermail/list/2011-October/000150.html. 

External links

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.