Temporal isolation among virtual machines

Temporal isolation or performance isolation among virtual machine (VMs) refers to the capability of isolating the temporal behavior (or limiting the temporal interferences) of multiple VMs among each other, despite them running on the same physical host and sharing a set of physical resources such as processors, memory, and disks.

Introduction to the problem

One of the key advantages of using virtualization in server consolidation, is the possibility to seamlessly "pack" multiple under-utilized systems into a single physical host, thus achieving a better overall utilization of the available hardware resources. In fact, an entire Operating System (OS), along with the applications running within, can be run in a virtual machine (VM). However, when multiple VMs concurrently run on the same physical host, they share the available physical resources, including CPU(s), network adapter(s), disk(s) and memory. This adds a level of unpredictability in the performance that may be exhibited by each individual VM, as compared to what is expected. For example, a VM with a temporary compute-intensive peak might disturb the other running VMs, causing a significant and undesirable temporary drop in their performance. In a world of computing that is shifting towards cloud computing paradigms where resources (computing, storage, networking) may be remotely rented in virtualized form under precise service-level agreements, it would be highly desirable that the performance of the virtualized resources be as stable and predictable as possible.

Possible solutions

Multiple techniques may be used to face with the aforementioned problem. They aim to achieve some degree of temporal isolation across the concurrently running VMs, at the various critical levels of scheduling: CPU scheduling, network scheduling and disk scheduling.

For the CPU, it is possible to use proper scheduling techniques at the hypervisor level in order to contain the amount of computing each VM may impose on a shared physical CPU or core. For example, on the Xen hypervisor, the BVT, Credit-based and S-EDF schedulers have been proposed for controlling how the computing power is distributed among competing VMs.[1] In order to obtain a stable performance of virtualized applications, it is necessary to use those scheduler configurations which are not work-conserving. Also, on the KVM hypervisor, it has been proposed to use EDF-based scheduling strategies[2] in order to keep a stable and predictable performance of virtualized applications.[3][4] Finally, with a multi-core or multi-processor physical host, it is possible to deploy each VM on a separate processor or core, in order to temporally isolate the performance of various VMs.

For the network, it is possible to use traffic shaping techniques in order to limit the amount of traffic that each VM may impose on the host. Also, it is possible to install multiple network adapters on the same physical host, and configure the virtualization layer so that each VM may grant exclusive access to each one of them. For example, this is possible with the driver domains of the Xen hypervisor. Multi-queue network adapters exist which support multiple VMs at the hardware level, having separate packet queues associated to the different hosted VMs (by means of the IP addresses of the VMs), such as the Virtual Machine Device Queue (VMDq) devices by Intel.[5] Finally, real-time scheduling of the CPU may also be used for enhancing temporal isolation of network traffic from multiple VMs deployed on the same CPU.[6]

When using real-time scheduling for controlling the amount of CPU resources reserved for each VM, one challenging problem is properly accounting for the CPU time applicable to system-wide activities. For example, in the case of the Xen scheduler, the Dom0 and the driver domains services might be shared across multiple VMs accessing them. Similarly, in the case of the KVM hypervisor, the workload imposed on the host OS due to serving network traffic for each individual guest OS might not be easily distinguishable, because it mainly involves kernel-level device drivers and the networking infrastructure (on the host OS). Some techniques for mitigating such problems have been proposed for the Xen case.[7]

Along the lines of adaptive reservations, it is possible to apply feedback-control strategies in order to dynamically adapt the amount of resources reserved to each virtual machine, in order to keep a stable performance level for the virtualized application(s).[8] Following the trend of adaptiveness, in those cases in which a virtualized system is not fulfilling the expected performance levels (either due to unforeseen interferences of other concurrently running VMs, or due to a bad deployment strategy that simply picked up a machine with insufficient hardware resources), it is possible to live-migrate virtual machines while they are running, so as to host them on a more capable (or less loaded) physical host.

References

  1. Ludmila Cherkasova, Diwaker Gupta, Amin Vahdat (3 September 2007), "Comparison of the Three CPU Schedulers in Xen", Performance Evaluation Review. Vol 35, Number 2, retrieved 30 June 2010
  2. Fabio Checconi, Tommaso Cucinotta, Dario Faggioli, Giuseppe Lipari, Hierarchical Multiprocessor CPU Reservations for the Linux Kernel, Proceedings of the 5th International Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2009), Dublin, Ireland, June 2009
  3. Tommaso Cucinotta, Gaetano Anastasi, Luca Abeni, Respecting temporal constraints in virtualised services, Proceedings of the 2nd IEEE International Workshop on Real-Time Service-Oriented Architecture and Applications (RTSOAA 2009), Seattle, Washington, July 2009
  4. Tommaso Cucinotta, Gaetano Anastasi, Luca Abeni, Real-Time Virtual Machines, Proceedings of the 29th Real-Time System Symposium (RTSS 2008) -- Work in Progress Session, Barcelona, December 2008
  5. Shefali Chinni, Radhakrishna Hiremane, Virtual Machine Device Queues, Intel Virtualization Technology White Paper, 2007
  6. Tommaso Cucinotta, Dhaval Giani, Dario Faggioli and Fabio Checconi, Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling, Proceedings of the 5th Workshop on Virtualization and High-Performance Cloud Computing (VHPC 2010), Ischia (Naples), Italy, August 2010.
  7. Diwaker Gupta, Lucy Cherkasova, Robert Gardner, Amin Vahdat, Enforcing Performance Isolation Across Virtual Machines in Xen, Proceedings of the 7th International Middleware Conference (Middleware 2006), Lecture Notes in Computer Science, Volume 4290/2006, pp.342-362, Melbourne, Australia, November 2006
  8. Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah (April 2010), "Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds", Proc. of the 5th European conference on Computer systems (EuroSys 2010) (Paris, France)