L4 microkernel family

From Wikipedia, the free encyclopedia

L4 is, collectively, a family of related computer programs. They are microkernels that are becoming well known in the computer industry for their excellent performance and small footprint. Originally L4 was a single product, a highly tuned Intel i386 kernel designed and implemented by Jochen Liedtke. Since then the system has seen dramatic development in a number of directions, both in producing platform-independent, yet highly performant implementations, and also in improving security, isolation, and robustness.

Contents

[edit] History

The realization of drawbacks in design and performance of the Mach microkernel led a number of developers to re-examine the entire microkernel concept in the mid-1990s. Mach was adding considerable overhead to the inter-process communication (IPC) in order to support concepts that weren't really useful in anything but a Unix context. The IPC system itself was a classic example of a distributed cost. For example, on a single-user system like a cell phone, permissions and rights checking are far less important than on a full Unix system. While Mach claimed to be a microkernel, it seemed that it actually contained far more than it needed to.

[edit] The early years: L3

Jochen Liedtke set out to prove that a well designed thinner IPC layer, with careful attention to performance and machine-specific (as opposed to platform independent) design could yield massive real-world performance improvements. Instead of Mach's complex IPC system, his L3 microkernel simply passed the message without any additional overhead. Defining and implementing the required security policies were considered to be duties of the user space servers. The role of the kernel was only to provide the necessary mechanism to enable the user-level servers to enforce the policies.

In addition to lowering the complexity of the IPC mechanism, L3 also used a variety of mechanisms and tricks to optimize the message passing. For example, for large message transfers the kernel utilized the hardware's memory management unit (MMU) to avoid expensive in-kernel temporary buffers. For short message transfers the kernel only used the processor's hardware registers, avoiding memory accesses altogether. In contrast, Mach used a one-size-fits-all mechanism that sacrificed speed for portability. The result of these changes was a massive reduction in IPC overhead. On the same system where Mach required 114 microseconds for even the smallest of messages, L3 could send the same message in less than 10 microseconds. The overall time for a system call was less than half the time on Unix, as opposed to Mach where the same system call took five times that of Unix. L3 was proven as a safe and robust operating system for many years by its use by TÜV SÜD. After some experience using L3, Liedtke came to realize that several other Mach concepts were also misplaced. By simplyfying the microkernel concepts even further he developed the first L4 kernel.

[edit] Design dogma: No policy, only mechanism

It appears in retrospect that the vast majority of Mach's performance problems could only be solved by resorting to a fresh design. For instance, another major bottleneck in Mach compared to monolithic kernels is that in a true "collection-of-servers" system the kernel had no real way to know how to effectively page memory. Developers using monolithic kernels could, and did, spend considerable time trying to understand the exact nature of memory use in the kernel, and then tuned their system to take advantage of this knowledge. On a microkernel the developer has no idea what makes up the system, and no way to closely monitor memory usage except in specific cases.

Liedtke decided that the solution to this problem was to simply remove policy making decisions like paging from the kernel altogether, and allow each application to apply the sort of tuning formerly applied only to the monolithic kernels. Under an L4 system the operating system implemented in user-level server (as opposed to within the kernel) is expected to provide paging services, potentially in many varieties, allowing the developer to pick the one best suited to their workload. The kernel's role is reduced to knowing that such systems exist and providing a mechanism for supporting them.

The philosophy of the microkernel design is minimalist. The kernel should only contain a minimal set of concepts. The philosophy can be summarized as follows:

A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e., permitting competing implementations would prevent the implementation of the systems' required functionality. [1]

An operating system based on L4 has to provide services that the older generation microkernels included internally. For example, in order to implement a secure Unix-like system, servers will have to provide the rights management that Mach included in the kernel. Additionally, the messages will, in most cases, still have to be checked for validity. It still remains unclear if the end-to-end performance of a "real world" operating system based on L4 will run significantly faster than one built on Mach, although all evidence suggests so. Tests of a Linux ported to run on top of L4 (L4Linux), another ported to run on Mach (MkLinux) and the basic Linux system itself showed that L4 clearly performed better than Mach. In high-level benchmarks, MkLinux was 15-30% slower than the monolithic kernel even in the best case, whereas L4 was about 5-10% slower.[2]

[edit] The pre-Pistachio years

Liedtke's original version of L4 was designed primarily with high performance in mind. In order to wring out every bit of performance the whole kernel was written in assembly language. His work caused a minor revolution in operating system design circles. Soon it was being studied by a number of universities and research institutes, including IBM, where Liedtke started to work in 1996. At IBM's T.J. Watson Research Center Liedtke and his colleagues continued research on L4 and microkernel based systems in general.

In 1999, Liedtke took over the Systems Architecture Group at the University of Karlsruhe, where he continued the research into microkernel systems. As a proof of concept that a high performance microkernel could also be constructed in a higher level language, the group developed L4Ka::Hazelnut, a C++ version of the kernel that ran on IA32 and ARM based machines. The effort was a success, performance was still excellent, and with its release the pure assembly language versions of the kernels were effectively discontinued.

Up until and including the release of L4Ka::Hazelnut all L4 microkernels had been inherently tied close to the underlying CPU architecture. The next big shift in L4 development was the development of a platform independent API that still retained the high performance characteristics despite its higher level of portability. Although the underlying concepts of the kernel were the same, the new API provided many radical changes to previous L4 versions, including better support for multi-processor systems, looser ties between threads and address spaces, and the introduction of user-level thread control blocks (UTCBs) and virtual registers. After releasing the new L4 API in early 2001 the System Architecture Group at the University of Karlsruhe implemented a new kernel, L4Ka::Pistachio, completely from scratch, now with focus on both high performance as well as portability.

Development also took place at the University of New South Wales (UNSW), where developers implemented L4 on several 64-bit platforms. Their work resulted in L4/MIPS and L4/Alpha, resulting in Liedtke's original version being retroactively named L4/x86. Like Liedtke's original kernels, the UNSW kernels were unportable and each implemented from scratch. With the release of the highly portable L4Ka::Pistachio, the UNSW group abandoned their own kernels in favour of producing highly tuned ports of L4Ka::Pistachio.

[edit] Current research and development

Recently the UNSW group, at their new home at National ICT Australia (NICTA), created a new version of L4 called NICTA::L4-embedded. As the name implies, this is aimed at use in commercial embedded systems, and consequently the implementation tradeoffs favour small memory footprints and aim to reduce complexity. There is also on-going work on a formalisation of the L4 API, on the formal proof of the correctness of the implementation, as well as frameworks for developing well-structured systems on top of L4.

Fiasco is a further development of the original L4 that includes hard real-time support, which is used as the basis of the DROPS operating system and also of TUD:OS. For real-time use "fast" is not enough, so the Fiasco kernel is entirely reentrant, allowing it to be interrupted at any time. Like other developments of the original L4, Fiasco is also written in C++ for readability and portability reasons.

Most development today appears to be on the Pistachio kernel (both the L4Ka and the NICTA brands). The University of New South Wales uses Pistachio to continue their experiments in portability, and the NICTA::Pistachio-embedded kernel is now offered on a wide variety of hardware. Other teams explore real-time support, adding in Fiasco-like concepts. Developing the basic kernel architecture also continues at the University of Karlsruhe, who are working on improving the API in security and virtualization.

The GNU Hurd project was considering adopting the L4 microkernel (GNU Hurd/L4 [1]) to replace Mach, but currently has decided on the Coyotos kernel.

Osker [2], an OS written in Haskell, is being written to match the L4 specification; although this focuses on the use of a functional programming language for OS development, not strictly microkernel research.

[edit] Architecture design

  • Dynamic memory mapping: each process has its private address space which can be dynamically extended or reduced with mapping and un mapping of pages[2]. This provides a capability-like approach and allows to almost adhere to the principle of least privilege.
  • Memory-mapped I/O. External devices are mapped in the process address space and can be accessed as regular memory pages; also map and unmap operations can be applied to external devices[2].
  • Hardware interrupts are handled as IPC[2] (so external devices are abstracted as processes).
  • processor−dependent: various implementation are provided that are optimized for specific CPUs, even as similar as 486 and Pentium[3].

[edit] Performance

IPC is crucial in a microkernel system. IPC cost is, in the formula Tsetup+MsgLength*Ttransf, about 4.8μs + L*0.025μs on a 486−DX50[3]; Mach IPC on the same machine has a Tsetup 20 time higher, around 115us[3].

[edit] References

  1. ^ Jochen Liedtke (December 1995). "On µ-Kernel Construction". Proc. 15th ACM symposium on Operating Systems Principles (SOSP): 237-250.
  2. ^ a b c d Hermann Härtig, Michael Hohmuth, Jochen Liedtke, Sebastian Schönberg, Jean Wolter (October 1997). "The performance of μ-kernel-based systems". Proc. 16th ACM symposium on Operating Systems Principles (SOSP): 66–77.
  3. ^ a b c T. P. Scheuermann (2002). "Evolution in Microkernel Design". Computer Science Department, University of North Carolina, Chapel Hill, NC: 7.

[edit] External links