cgroups
cgroups (control groups) is a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups. This work was started by Rohit Seth in 2006 under the name "process containers";[1] in late 2007 it was renamed to cgroups and merged to kernel version 2.6.24.[2] Since then, many new features and controllers have been added.
Features
One of the design goals of cgroups was to provide a unified interface to many different use cases, from controlling single processes (like nice) to whole operating system-level virtualization (like OpenVZ, Linux-VServer). Cgroups provides:
- Resource limiting: groups can be set to not exceed a set memory limit — this also includes file system cache.[3] The original paper was presented at Linux Symposium and can be found at Containers: Challenges with the memory resource controller and its performance[4]
- Prioritization: some groups may get a larger share of CPU[5] or disk I/O throughput.[6]
- Accounting: to measure how much resources certain systems use for e.g. billing purposes.[7]
- Isolation: separate namespaces for groups, so they don't see each other's processes, network connections or files.[2]
- Control: freezing groups or checkpointing and restarting.[7]
Usage
A control group is a collection of processes that are bound by the same criteria. These groups can be hierarchical, where each group inherits limits from its parent group. The kernel provides access to multiple controllers (subsystems) through the cgroup interface.[2] For instance, the "memory" controller limits memory use, the "ns" controller separates processes into isolated namespaces, "cpuacct" accounts CPU usage, etc.
Control groups can be used in multiple ways:
- By accessing the cgroup virtual file system manually
- Create and manage groups on the fly using tools like cgcreate, cgexec, cgclassify (from libcgroup)
- The "rules engine daemon" that can automatically move processes of certain users, groups or commands to cgroups as specified in configuration
- Indirectly through other software that uses cgroups, such as Linux Containers (LXC) virtualization[8] or libvirt
Namespace isolation
While not technically part of the cgroups work, a related feature is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups. For example, a PID namespace provides a separate enumeration of process identifiers within each namespace. Also available are mount, UTS, network and SysV IPC namespaces. If the "ns" cgroup is mounted, each namespace will also create a new group in the cgroup hierarchy.
- The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces—albeit with different PID numbers.[9]
- Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device.[10]
- "UTS" namespace allows changing the hostname
- Mount namespace allows creating a different file system layout, or making certain mount points read-only.[11]
- IPC namespace isolates the System V inter-process communication between namespaces.
Namespaces are created with the "unshare" command or syscall, or as new flags in a "clone" syscall.[12]
See also
References
External links