SimGrid

SimGrid
Initial release 1998[1]
Stable release 3.10[2] / November 17, 2013[2]
Written in Core: C; Bindings: Java, lua, Ruby.[3]
Platform Unix, Mac OS X, Microsoft Windows
Type Distributed system simulator, Network simulator
License GNU Lesser General Public License[3]
Website simgrid.gforge.inria.fr

SimGrid is a toolkit that provides core functionalities for the simulation of distributed applications in heterogeneous distributed environments. The specific goal of the project is to facilitate research in the area of parallel and distributed large scale systems, such as Grids, P2P systems and Cloud. Its use cases encompass heuristic evaluation, application prototyping or even real application development and tuning.

History

SimGrid v1

In 1999 Henri Casanova joined the AppLeS research group in the Computer Science and Engineering Department at the University of California at San Diego, as a post-doc. The AppLeS group, led by Francine Berman, focused mostly on the study of practical scheduling algorithms for parallel scientific application on heterogeneous, distributed computing platforms. Shortly after Henri joined the group he faced the need to run simulation instead of or in addition to merely running real-world experiments. At that time Arnaud Legrand, a 1st year graduate student at Ecole Normale Superieure de Lyon, France, spent 2 months in the summer in the AppLeS group as a visiting student. He worked with Henri that summer on a research project as part of which he implemented an ad-hoc simulator.

After Arnaud left UCSD, Henri realized that most likely every researcher in the AppLeS group would eventually need to run simulations, and that they would most likely all end up rewriting the same code at one point or another. He took apart the simulator that Arnaud had developed, and packaged it as a more generic simulation framework with a simple API, and called it SimGrid v1.0 (a.k.a. SG). This version was simple, and in retrospect a bit naive. However, it was surprisingly useful to study "centralized" scheduling (e.g., off-line scheduling of a DAG on a heterogeneous set of distributed compute nodes). SimGrid v1.0 was described in "SimGrid: A Toolkit for the Simulation of Application Scheduling, by Henri Casanova, in Proceedings of CCGrid 2001". Henri became the first user of SimGrid and used it for several research projects from then on.

SimGrid v2

By 2001 time Arnaud was engaged in his Ph.D. thesis work and started studying "decentralized" scheduling heuristics, that is ones in which scheduling decisions are made by more or less autonomous agents that typically have only partial knowledge of the applications and/or computing platform. Although simulating decentralized scheduling with SimGrid v1.0 was actually possible (and done by one Ph.D. student at UCSD in fact!), it was extremely cumbersome and limited in scope. So Arnaud built a layer on top of SG, which he called MSG (for Meta-SimGrid). MSG added threads and introduced the concept of independently running simulated processes that performed computations and communication tasks in possibly asynchronous fashion. MSG was described in "MetaSimGrid: Towards realistic scheduling simulation of distributed applications, by Arnaud Legrand and Julien Lerouge, LIP Research Report". This resulted in the following layered architecture:

   (user code)
   -----------
   | MSG |   |
   -------   |
   |    SG   |
   -----------

With Henri and some of his students using SG and Arnaud using MSG, the project started having a (tiny) user base. It was time to be more ambitious and to address one of the key limitations of SG: its inability to simulate multi-hop network communications realistically. In the summer 2003 Loris Marchal, a 1st year graduate student at Ecole Normale Superieure, came to UCSD to work with Henri. During that summer, based on results in the TCP modeling literature, he implemented a macroscopic network model as part of SG. This model dramatically increased the level of realism of SimGrid simulations and was initially described in: "A Network Model for Simulation of Grid Applications, by Loris Marchal and Henri Casanova, LIP research report". By the end of 2003 the work at UCSD and at Ecole Normale was merged in what became SimGrid v2, as described in: "Scheduling Distributed Applications: the SimGrid Simulation Framework, by Henri Casanova, Arnaud Legrand, and Loris Marchal, in Proceedings of CCGrid 2003".

SimGrid v3

SimGrid v2, with its much improved features and capabilities, garnered a larger user base and many friends and collaborators of Arnaud and Henri started using it for their research. On these friends was Martin Quinson, then a Ph.D. student at Ecole Normale Superieure, who was working in the area of distributed resource monitoring systems. As part of his Ph.D. Martin attempted to develop a network topology discovery tool and quickly found out that it was difficult and required prototyping in simulation. Faced with the perspective of first implementing a throw-away prototype in simulation and then reimplementing the whole thing for production, Martin started working on a framework that would easily compile the same code in "simulation mode" or in "real-world mode". He found this ability to be invaluable when developing distributed systems and built his framework, called GRAS, on top of MSG (for the simulation mode) and on top of the socket layer (for the real-world mode). GRAS is described in "GRAS: A Research & Development Framework for Grid and P2P Infrastructures, by Martin Quinson, in Proceedings of PDCS 2006". This led to the following layered software architecture:

	(user code for either SG, MSG or GRAS)
   	-----------------------------
   	|   |     |    GRAS API     |
   	|   |     -------------------
   	|   |     |GRAS S | |GRAS R |
   	|   |     --------- ---------
   	|   |    MSG      | |sockets|
   	|   --------------| ---------
   	|        SG       |
   	-------------------

At this point, with more users running more complex simulations, it became clear that the initial SG foundation inherited from SimGrid v1 was too limiting in terms of scalability and performance. In 2005 Arnaud took the bull by the horns and replaced SG with a new simulation engine called SURF, thus removing the SG API. Users reported acceleration factors of up to 3 orders of magnitude when going from SG to SURF. Furthermore, SURF is much more extensible than SG ever was and has enabled the evolution of simulation models used by SimGrid. Although it made sense at the time to re-implement GRAS on top of SURF, it was never accomplished due to the "too many things to do not enough time" syndrome. Martin added a layer on top of GRAS called AMOK, to implement high-level services needed by many distributed applications, thus leading to the new overall layered architecture:

  (user code for either MSG or GRAS—using AMOK or not)
                         -------
                         | AMOK|
       -------------------------
       |     |    GRAS API     |
       |     -------------------
       |     |GRAS S | |GRAS R |
       |     --------- ---------
       |    MSG      | |sockets|
       --------------| ---------
       |   SURF      |
       ---------------

This architecture culminated in SimGrid v3! One development worth mentioning is that of SimDAG, written by Christophe Thiery during an Internship with Martin Quinson. Many users indeed had asked functionality similar to what the SG API provided in SimGrid v1 and v2, to study centralized scheduling without all the power of the MSG API. SimDAG provides an API especially for this purpose and was integrated in SimGrid v3.1, leading to the following layered architecture:

(user code for either SimDag, MSG or GRAS)
                            -------
                            | AMOK|
   --------------------------------
   |      |     |    GRAS API     |
   |      |     -------------------
   |      |     |GRAS SG| |GRAS RL|
   |      |     --------- ---------
   |SimDag|    MSG      | |sockets|
   |--------------------| ---------
   |        SURF        |
   ----------------------

SimGrid 3.2, the current publicly available version as this document is being written, implements the above architecture and also provides a (partial) port to the Windows operating system. Ongoing Work

As the project advances, it becomes increasingly clearer that there is a need for an intermediate layer between the base simulation engine, SURF, and higher level APIs. In the previously shown software architecture MSG plays the role of an intermediate layer between SURF and GRAS, but is itself a high-level API, which is not very good design. Bruno Donassolo, during an internship with Arnaud, has developed an intermediate layer called SIMiX, and both GRAS and MSG are being rewritten on top of it.

Another development is that of SMPI, a framework to run unmodified MPI applications in either simulation mode or in real-world mode (sort of GRAS for MPI). The development of SMPI, by Mark Stillwell who works with Henri, is being greatly simplified thanks to the aforementioned SIMiX layer. Finally, somewhat unrelated, is the development of Java bindings for the MSG API by Malek Cherier who works with Martin. The current software architecture thus looks as follows:

(user code for either SimDAG, MSG, GRAS, or MPI)
   ----------------------------------
   |      |   |jMSG|    |AMOK|      |
   |      |   -----|    ------      |
   |SimDag| MSG    | GRAS    | SMPI |     (Note that GRAS and SMPI also run on top of
   |      ---------------------------	    sockets and MPI, not shown on the figure)
   |      |           SIMiX         |
   ----------------------------------
   |              SURF              |
   ----------------------------------

While the above developments are about adding simulation functionality, a large part of the research effort in the SimGrid project relates to simulation models. These models are implemented in SURF, and Arnaud has refactored SURF to make it more easily extensible so that one can experiment with different models, in particular different network models. Pedro Velho, who works with Arnaud, is currently experimenting with several new network models. Also, Kayo Fujiwara, who works with Henri, has interfaced SURF with (a patched version of) the GTNetS packet-level simulator.

The current architecture in the CVS tree at the time this document is being written is as follows:

   ----------------------------------
   |      |   |jMSG|    |AMOK|      |
   |      |   ------    ------      |
   |SimDag| MSG    | GRAS    | SMPI |  (Note that GRAS and SMPI also run on top of
   |      |        |     -------    |   sockets and MPI, not shown on the figure)
   |      |        |     |SMURF|    |  
   |      ---------------------------  
   |      |          SIMiX          |
   ----------------------------------
   |         SURF interface         |
   ----------------------------------
   |    SURF kernel   |    | GTNetS |
   | (several models) |    |        |
   --------------------    ----------

Future Directions

The primary short-term future direction is to develop a distributed version of SIMiX to increase the scalability of simulations in terms of memory. This can be done using the GRAS "real world" functionality to run SIMiX in a distributed fashion across multiple hosts, thus allowing running simulations that are not limited by the amount of memory on a single host. The simulation itself would still be centralized and sequential, meaning that a single simulated process would run at a time. Bruno Donassolo is currently working on this idea, which is currently called SMURF.

Longer-term plans include:

One of the constant challenges in this project is its duality: it is a useful tool for scientists (hence our efforts on APIs, portability, documentation, etc.), but is it also a scientific project in its own right (so that we can publish papers).

See also

References

  1. Casanova, Henri (May 2001). "A Toolkit for the Simulation of Application Scheduling". First IEEE International Symposium on Cluster Computing and the Grid (CCGrid'01). Brisbane, Australia. pp. 430–441. doi:10.1109/CCGRID.2001.923223.
  2. 2.0 2.1 "SimGrid download page". Retrieved December 11, 2013.
  3. 3.0 3.1 "Official SimGrid Page". Retrieved November 27, 2012.

External links

SimGrid, Casanova, H., Legrand, A. and Quinson, M., SimGrid: a Generic Framework for Large-Scale Distributed Experiments, 10th IEEE International Conference on Computer Modeling and Simulation, 2008.

Refereed Papers about SimGrid