Paging

In computer operating systems, paging is one of the memory-management schemes by which a computer can store and retrieve data from secondary storage for use in main memory. In the paging memory-management scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. The main advantage of paging over memory segmentation is that it allows the physical address space of a process to be noncontiguous. Before the time paging was used, systems had to fit whole programs into storage contiguously, which caused various storage and fragmentation problems.[1]

Paging is an important part of virtual memory implementation in most contemporary general-purpose operating systems, allowing them to use disk storage for data that does not fit into physical random-access memory (RAM).

Contents

Overview

The main functions of paging are performed when a program tries to access pages that are not currently mapped to physical memory (RAM). This situation is known as a page fault. The operating system must then take control and handle the page fault, in a manner invisible to the program. Therefore, the operating system must:

  1. Determine the location of the data in auxiliary storage.
  2. Obtain an empty page frame in RAM to use as a container for the data.
  3. Load the requested data into the available page frame.
  4. Update the page table to show the new data.
  5. Return control to the program, transparently retrying the instruction that caused the page fault.

Until there is not enough RAM to store all the data needed, the process of obtaining an empty page frame does not involve removing another page from RAM. If all page frames are non-empty, obtaining an empty page frame requires choosing a page frame containing data to empty. If the data in that page frame has been modified since it was read into RAM (i.e., if it has become "dirty"), it must be written back to its location in secondary storage before being freed; otherwise, the contents of the page's page frame in RAM are the same as the contents of the page in secondary storage, so it does not need to be written back to secondary storage. If a reference is then made to that page, a page fault will occur, and an empty page frame must be obtained and the contents of the page in secondary storage again read into that page frame.

Efficient paging systems must determine the page frame to empty by choosing one that is least likely to be needed within a short time. There are various page replacement algorithms that try to do this. Most operating systems use some approximation of the least recently used (LRU) page replacement algorithm (the LRU itself cannot be implemented on the current hardware) or a working set-based algorithm.

To further increase responsiveness, paging systems may employ various strategies to predict which pages will be needed soon. Such systems will attempt to load pages into main memory preemptively, before a program references them.

Page replacement algorithms

Demand paging

When pure demand paging is used, page loading only occurs at the time of the data request, and not before. In particular, when a demand pager is used, a program usually begins execution with none of its pages pre-loaded in RAM. Pages are copied from the executable file into RAM the first time the executing code references them, usually in response to page faults. As a consequence, pages of the executable file containing code not executed during a particular run will never be loaded into memory.

Anticipatory paging

This technique, sometimes called "swap prefetch", preloads a process's non-resident pages that are likely to be referenced in the near future (taking advantage of locality of reference). Such strategies attempt to reduce the number of page faults a process experiences. Some of those strategies are "if a program references one virtual address which causes a page fault, perhaps the next few pages' worth of virtual address space will soon be used" and "if one big program just finished execution, leaving lots of free RAM, perhaps the user will return to using some of the programs that were recently paged out".

Free page queue

The free page queue is a list of page frames that are available for assignment after a page fault. Some operating systems[NB 1] support page reclamation; if a page fault occurs for a page that had been stolen and the page frame was never reassigned, then the operating system avoids the necessity of reading the page back in by assigning the unmodified page frame.

Page stealing

Some operating systems periodically look for pages that have not been recently referenced and add them to the Free page queue, after paging them out if they have been modified.

Pre-cleaning

Unix operating systems periodically use sync to pre-clean all dirty pages, that is, to save all modified pages to hard disk. Windows operating systems do the same thing via "modified page writer" threads.

Pre-cleaning makes starting a new program or opening a new data file much faster. The hard drive can immediately seek to that file and consecutively read the whole file into pre-cleaned page frames. Without pre-cleaning, the hard drive is forced to seek back and forth between writing a dirty page frame to disk, and then reading the next page of the file into that frame.

Thrashing

Most programs reach a steady state in their demand for memory locality both in terms of instructions fetched and data being accessed. This steady state is usually much less than the total memory required by the program. This steady state is sometimes referred to as the working set: the set of memory pages that are most frequently accessed.

Virtual memory systems work most efficiently when the ratio of the working set to the total number of pages that can be stored in RAM is low enough that the time spent resolving page faults is not a dominant factor in the workload's performance. A program that works with huge data structures will sometimes require a working set that is too large to be efficiently managed by the page system resulting in constant page faults that drastically slow down the system. This condition is referred to as thrashing: pages are swapped out and then accessed causing frequent faults.

An interesting characteristic of thrashing is that as the working set grows, there is very little increase in the number of faults until the critical point (when faults go up dramatically and the majority of the system's processing power is spent on handling them).

An extreme example of this sort of situation occurred on the IBM System/360 Model 67 and IBM System/370 series mainframe computers, in which a particular instruction could consist of an execute instruction, which crosses a page boundary, that the instruction points to a move instruction, that itself also crosses a page boundary, targeting a move of data from a source that crosses a page boundary, to a target of data that also crosses a page boundary. The total number of pages thus being used by this particular instruction is eight, and all eight pages must be present in memory at the same time. If the operating system will allocate less than eight pages of actual memory in this example, when it attempts to swap out some part of the instruction or data to bring in the remainder, the instruction will again page fault, and it will thrash on every attempt to restart the failing instruction.

To decrease excessive paging, and thus possibly resolve thrashing problem, a user can do any of the following:

The term thrashing is also used in contexts other than virtual memory systems, for example to describe cache issues in computing or silly window syndrome in networking.

Terminology

Historically, paging sometimes referred to a memory allocation scheme that used fixed-length pages as opposed to variable-length segments, without implicit suggestion that virtual memory techniques were employed at all or that those pages were transferred to disk.[2] [3] Such usage is rare today.

Some modern systems use the term swapping along with paging. Historically, swapping referred to moving from/to secondary storage a whole program at a time, in a scheme known as roll-in/roll-out. [4] [5] In the 1960s, after the concept of virtual memory was introduced—in two variants, either using segments or pages—the term swapping was applied to moving, respectively, either segments or pages, between disk and memory. Today with the virtual memory mostly based on pages, not segments, swapping became a fairly close synonym of paging, although with one difference.

In many popular systems, there is a concept known as page cache, of using the same single mechanism for both virtual memory and disk caching. A page may be then transferred to or from any ordinary disk file, not necessarily a dedicated space. Page in is transferring a page from the disk to RAM. Page out is transferring a page from RAM to the disk. Swap in and out only refer to transferring pages between RAM and dedicated swap space or swap file or scratch disk space, and not any other place on disk.

On Windows NT based systems, dedicated swap space is known as a page file and paging/swapping are often used interchangeably.

Implementations

Ferranti Atlas

The first computer to support paging was the Atlas,[6][7][8] jointly developed by Ferranti, the University of Manchester and Plessey. The machine had an associative (content-addressable) memory with one entry for each 512 word page. The Supervisor[9] handled non-equivalence interruptions[NB 2] and managed the transfer of pages between core and drum in order to provide a one-level store[10] to programs.

Windows 3.x and Windows 9x

Virtual memory has been a feature of Microsoft Windows since Windows 3.0 in 1990. Microsoft introduced virtual memory in response to the failures of Windows 1.0 and Windows 2.0, attempting to slash resource requirements for the operating system.

Confusion abounds about Microsoft's decision to refer to the swap file as "virtual memory". Novices unfamiliar with the concept accept this definition without question, and speak of adjusting Windows' virtual memory size. In fact every process has a fixed, unchangeable virtual memory size, usually 2 GB. The user only has the option to change disk capacity dedicated to paging.

Windows 3.x creates a hidden file named 386SPART.PAR or WIN386.SWP for use as a swap file. It is generally found in the root directory, but it may appear elsewhere (typically in the WINDOWS directory). Its size depends on how much swap space the system has (a setting selected by the user under Control Panel → Enhanced under "Virtual Memory".) If the user moves or deletes this file, a blue screen will appear the next time Windows is started, with the error message "The permanent swap file is corrupt". The user will be prompted to choose whether or not to delete the file (whether or not it exists).

Windows 95, Windows 98 and Windows Me use a similar file, and the settings for it are located under Control Panel → System → Performance tab → Virtual Memory. Windows automatically sets the size of the page file to start at 1.5× the size of physical memory, and expand up to 3× physical memory if necessary. If a user runs memory-intensive applications on a system with low physical memory, it is preferable to manually set these sizes to a value higher than default.

Windows NT

In NT-based versions of Windows (such as Windows XP, Windows Vista, and Windows 7), the file used for paging is named pagefile.sys. The default location of the page file is in the root directory of the partition where Windows is installed. Windows can be configured to use free space on any available drives for pagefiles. It is required, however, for the boot partition (i.e. the drive containing the Windows directory) to have a pagefile on it if the system is configured to write either kernel or full memory dumps after a crash. Windows uses the paging file as temporary storage for the memory dump. When the system is rebooted, Windows copies the memory dump from the pagefile to a separate file and frees the space that was used in the pagefile.[11]

Fragmentation

In Windows' default configuration the pagefile is allowed to expand beyond its initial allocation when necessary. If this happens gradually, it can become heavily fragmented which can potentially cause performance problems.[12] The common advice given to avoid this is to set a single "locked" pagefile size so that Windows will not expand it. However, the pagefile only expands when it has been filled, which, in its default configuration, is 150% the total amount of physical memory.[13] Thus the total demand for pagefile-backed virtual memory must exceed 250% of the computer's physical memory before the pagefile will expand.

The fragmentation of the pagefile that occurs when it expands is temporary. As soon as the expanded regions are no longer in use (at the next reboot, if not sooner) the additional disk space allocations are freed and the pagefile is back to its original state.

Locking a page file's size can be problematic in the case that a Windows application requests more memory than the total size of physical memory and the page file. In this case, requests to allocate memory fail, which may cause applications and system processes to fail. Supporters of this view will note that the page file is rarely read or written in sequential order, so the performance advantage of having a completely sequential page file is minimal. However, it is generally agreed that a large page file will allow use of memory-heavy applications, and there is no penalty except that more disk space is used.

Defragmenting the page file is also occasionally recommended to improve performance when a Windows system is chronically using much more memory than its total physical memory. This view ignores the fact that, aside from the temporary results of expansion, the pagefile does not become fragmented over time. In general, performance concerns related to pagefile access are much more effectively dealt with by adding more physical memory.

Unix and Unix-like systems

Unix systems, and other Unix-like operating systems, use the term "swap" to describe both the act of moving memory pages between RAM and disk, and the region of a disk the pages are stored on. In some of those systems, it is common to dedicate an entire partition of a hard disk to swapping. These partitions are called swap partitions. Many systems have an entire hard drive dedicated to swapping, separate from the data drive(s), containing only a swap partition. A hard drive dedicated to swapping is called a "swap drive" or a "scratch drive" or a "scratch disk". Some of those systems only support swapping to a swap partition; others also support swapping to files.

Linux

From a software point of view with the 2.6 Linux kernel, swap files are just as fast[14][15] as swap partitions. The kernel keeps a map of where the swap file exists, and accesses the disk directly, bypassing caching and filesystem overhead.[15] Red Hat recommends using a swap partition.[16] With a swap partition one can choose where on the disk it resides and place it where the disk throughput is highest. The administrative flexibility of swap files can outweigh the other advantages of swap partitions.

Linux supports using a virtually unlimited number of swapping devices, each of which can be assigned a priority. When the operating system needs to swap pages out of physical memory, it uses the highest-priority device with free space. If multiple devices are assigned the same priority, they are used in a fashion similar to level 0 RAID arrangements. This provides improved performance as long as the devices can be accessed efficiently in parallel. Therefore, care should be taken assigning the priorities. For example, swaps located on the same physical disk should not be used in parallel, but in order ranging from the fastest to the slowest (i.e.: the fastest having the highest priority).

Mac OS X

Mac OS X supports both swap partitions and the use of swap files, but the default and recommended configuration is to use multiple swap files.[17]

Solaris

Solaris allows swapping to raw disk slices as well as files. The traditional method is to use slice 1 (ie. the second slice) on the OS disk to house swap. Swap setup is managed by the system boot process if there are entries in the "vfstab" file, but can also be managed manually through the use of the "swap" command. While it is possible to remove, at runtime, all swap from a lightly loaded system, Sun does not recommend it. Recent additions to the ZFS file system allow creation of ZFS devices that can be used as swap partitions. Swapping to normal files on ZFS file systems is not supported.

AmigaOS 4

AmigaOS 4.0 introduced a new system for allocating RAM and defragmenting physical memory. It still uses flat shared address space that can not be defragmented. It is based on slab allocation method and paging memory that allows swapping.[18] [19] Paging was implemented in AmigaOS 4.1. Swap memory could be activated and deactivated any moment allowing the user to choose to use only physical RAM.

Performance

The backing store for a virtual memory operating system is typically many orders of magnitude slower than RAM. Additionally, using mechanical storage devices introduces delay, several milliseconds for a harddisk. Therefore it is desirable to reduce or eliminate swapping, where practical. Some operating systems offer settings to influence the kernel's decisions.

  1. Linux offers the /proc/sys/vm/swappiness parameter, which changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache.
  2. Windows 2000, XP, and Vista offer the DisablePagingExecutive registry setting, which controls whether kernel-mode code and data can be eligible for paging out.
  3. Mainframe computers frequently used head-per-track disk drives or drums for page and swap storage to eliminate seek time, and several technologies[20] to have multiple concurrent requests to the same device in order to reduce rotational latency.
  4. Flash memory has a finite number of erase-write cycles, (see Limitations of flash memory), and the smallest amount of data that can be erased at once might be very large (128 KiB for an Intel X25-M SSD [21]), seldom coinciding with pagesize. Therefore, flash memory may wear out quickly if used as swap space under tight memory conditions. On the attractive side, flash memory is practically delayless compared to harddisks, and not volatile as RAM chips. Schemes like ReadyBoost and Intel Turbo Memory are made to exploit these characteristics.

Many Unix-like operating systems (for example AIX, Linux and Solaris) allow using multiple storage devices for swap space in parallel, to increase performance.

Tuning swap space size

In some older virtual memory operating systems, space in swap backing store is reserved when programs allocate memory for runtime data. OS vendors typically issue guidelines about how much swap space should be allocated.

Reliability

Swapping can decrease system reliability by some amount. If swapped data gets corrupted on the disk (or at any other location, or during transfer), the memory will also have incorrect contents after the data has later been returned.

Addressing limits on 32-bit hardware

Paging is one way of allowing the size of the addresses used by a process -- the process's "virtual address space" or "logical address space" -- to be different from the amount of main memory actually installed on a particular computer -- the physical address space.

Main memory smaller than virtual memory

In most systems, the size of a process's virtual address space is much larger than the available main memory.[22]

In these systems, the amount of main memory used by a process is, at most, the amount of physical main memory available. The amount of physical main memory available is limited by the number of address bits on the address bus that connects the CPU to main memory -- for example, the 68000 CPU, and the i386SX CPU, both internally use 32-bit virtual addresses, but both have only 24 pins connected to the address bus, limiting addressing to at most 16 MB of physical main memory.

Even on systems that have the same or more physical address bits as virtual address bits, often the actual amount of physical main memory installed is much less than the size that can potentially be addressed, for financial reasons or because the hardware address map reserves large regions for I/O or other hardware features, so main memory cannot be placed in those regions.

Main memory the same size as virtual memory

It is not uncommon to find 32-bit computers with 4 GB of RAM, the maximum amount addressable without the use of, e.g., PAE. For some machines, e.g., the IBM S/370 in XA mode, the upper bit was not part of the address and only 2 GB could be addressed.

Paging and swap space can be used beyond this 4 GB limit, due to it being addressed in terms of disk locations rather than memory addresses.

While 32-bit programs on machines with linear address spaces will continue to be limited to the 4 GB they're capable of addressing, because they each exist in their own virtual address space, a group of programs can together grow beyond this limit.

The size of the cumulative total of virtual address spaces is still limited by the number of "process ID bits" supported in the page table.

On machines with segment registers, e.g., the access registers on an IBM System/370 in ESA mode,[23] the address space size is limited only by OS constraints, e.g., the need to fit the mapping tables into the available storage.

Main memory larger than virtual address space

A few computers have a main memory larger than the virtual address space of a process, such as the Magic-1, some PDP-11 machines, and some 32-bit processors with Physical Address Extension.[22]

This nullifies the main advantage of virtual memory, since a single process can't use more main memory than the amount of its virtual address space. Such systems often use paging techniques to obtain secondary benefits:

The size of the cumulative total of virtual address spaces is still limited by the number of "process ID bits" supported in the page table, or by the amount of secondary storage available.

See also

Notes

  1. ^ E.g., MVS
  2. ^ A non-equivalence interruption occurs when the high order bits of an address do not match any entry in the associative memory.

References

  1. ^ Belzer, Jack; Holzman, Albert G.; Kent, Allen, eds (1981). "Virtual memory systems". Encyclopedia of computer science and technology. 14. CRC Press. p. 32. ISBN 0824722140. http://books.google.com/?id=KUgNGCJB4agC&printsec=frontcover 
  2. ^ Deitel, Harvey M. (1983). An Introduction to Operating Systems. Addison-Wesley. pp. 181, 187. ISBN 0201144735 
  3. ^ Belzer, Jack; Holzman, Albert G.; Kent, Allen, eds (1981). "Operating systems". Encyclopedia of computer science and technology. 11. CRC Press. p. 433. doi:10.1002/. ISBN 0824722612. http://books.google.com/?id=uTFirmDlSL8C&printsec=frontcover 
  4. ^ Belzer, Jack; Holzman, Albert G.; Kent, Allen, eds (1981). "Operating systems". Encyclopedia of computer science and technology. 11. CRC Press. p. 442. ISBN 0824722612. http://books.google.com/?id=uTFirmDlSL8C&printsec=frontcover 
  5. ^ Cragon, Harvey G. (1996). Memory Systems and Pipelined Processors. Jones and Bartlett Publishers. p. 109. ISBN 0867204745. http://books.google.com/?id=q2w3JSFD7l4C 
  6. ^ Sumner, F. H.; Haley, G.; Chenh, E. C. Y. (1962), "The Central Control Unit of the 'Atlas' Computer", Information Processing 1962, IFIP Congress Proceedings, Proceedings of IFIP Congress 62, Spartan 
  7. ^ "The Atlas", University of Manchester: Department of Computer Science, http://www.computer50.org/kgill/atlas/atlas.html 
  8. ^ "Atlas Architecture", Atlas Computer, Chilton: Atlas Computer Laboratory, http://www.chilton-computing.org.uk/acl/technology/atlas/p005.htm 
  9. ^ Kilburn, T.; Payne, R. B.; Howarth, D. J. (December 1961), "The Atlas Supervisor", Computers - Key to Total Systems Control, Conferences Proceedings, Volume 20, Proceedings of the Eastern Joint Computer Conference Washington, D.C., Macmillan, pp. 279–294, http://www.chilton-computing.org.uk/acl/technology/atlas/p019.htm 
  10. ^ Kilburn, T.; Edwards, D. B. G.; Lanigan, M. J.; Sumner, F. H. (April 1962), "One-Level Storage System", IRE Transactions Electronic Computers (Institute of Radio Engineers) 
  11. ^ Tsigkogiannis, Ilias (December 11, 2006). "Crash Dump Analysis". Ilias Tsigkogiannis' Introduction to Windows Device Drivers. MSDN Blogs. http://blogs.msdn.com/iliast/archive/2006/12/11/crash-dump-analysis.aspx. Retrieved 2008-07-22. 
  12. ^ "Windows Sysinternals PageDefrag". Sysinternals. Microsoft. November 1, 2006. http://technet.microsoft.com/en-us/sysinternals/bb897426. Retrieved 2010-12-20. 
  13. ^ "How to determine the appropriate page file size for 64-bit versions of Windows Server 2003 or Windows XP (MSKB889654_". Knowledge Base. Microsoft. November 7, 2007. http://support.microsoft.com/kb/889654. Retrieved 2007-12-26. 
  14. ^ ""Jesper Juhl": Re: How to send a break? - dump from frozen 64bit linux". LKML. 2006-05-29. http://lkml.org/lkml/2006/5/29/3. Retrieved 2010-10-28. 
  15. ^ a b "Andrew Morton: Re: Swap partition vs swap file". LKML. http://lkml.org/lkml/2005/7/7/326. Retrieved 2010-10-28. 
  16. ^ http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/ch-swapspace.html "Swap space can be a dedicated swap partition (recommended), a swap file, or a combination of swap partitions and swap files."
  17. ^ John Siracusa (October 15, 2001). "Mac OS X 10.1". Ars Technica. http://arstechnica.com/reviews/os/macosx-10-1.ars/7. Retrieved 2008-07-23. 
  18. ^ Frieden brothers (2007). "AmigaOS4.0 Memory Allocation". Hyperion Entertainment. http://os4.hyperion-entertainment.biz/index.php%3Foption=content&task=view&id=22&Itemid=.html. Retrieved 2008-11-02. 
  19. ^ Frieden brothers (2007). "AmigaOS 4.0 new memory system revisited". Hyperion Entertainment. http://os4.hyperion-entertainment.biz/index.php%3Foption=content&task=view&id=23&Itemid=.html. Retrieved 2008-11-02. 
  20. ^ E.g., Rotational Position Sensing on a Block Multiplexor channel
  21. ^ "Aligning filesystems to an SSD’s erase block size | Thoughts by Ted". Thunk.org. 2009-02-20. http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size. Retrieved 2010-10-28. 
  22. ^ a b Bill Buzbee. "Magic-1 Minix Demand Paging Design". [1]
  23. ^ IBM (January 1987), IBM System/370 Extended Architecture Principles of Operation, Second Edition, SA22-7085-1. 

External links