Table of Contents

Adaptive Parallel And Distributed Simulation (A-PADS)

Project goal

Designing, implementing and evaluating a Parallel And Distributed Simulation (PADS) middleware capable to work seamlessly and efficiently among multiple execution architectures (e.g. multi-core smartphones, multi-core desktop CPUs, LAN/WAN clusters, HPCs such as the IBM Blue Gene and public Cloud infrastructures such as Amazon EC2).

Description

In the years, many simulation paradigms have been proposed, each one presenting some benefits and drawbacks. Among them, the Discrete Event Simulation (DES) is powerful in terms of expressiveness and easy to understand for the simulation models developer. DES alone or combined with other paradigms (such as agent-based simulation) has gained very good popularity in many fields (e.g. engineering, economics), not only for research but also for design, production support and what-if analysis.

The common approach for implementing simulators is based on a single execution unit (e.g. a CPU and some random access memory), that is a sequential (i.e. monolithic) simulator. The main advantage of this approach is its simplicity, but it also introduces some severe limitations: the memory resources of a single execution unit can be insufficient for the task of modeling complex systems. Furthermore, the amount of time needed to complete the simulation runs can be excessive.

A more modern approach, called Parallel Discrete Event Simulation (PDES), relies on multiple interconnected execution units (e.g. CPUs or hosts). In this way, building a so called Parallel And Distributed Simulation (PADS), it is possible to represent very large and complex models using aggregated resources from many execution units and, in some cases, to obtain a speed up with respect to sequential simulation.

Unfortunately, at the state of the art, there are still many problems that limit the efficiency and the diffusion of PADS. A couple of them are:

Past and ongoing activity

The A-PADS research project will further extend the work done on the ARTÌS/GAIA [1, 2] middleware. In the past years we have obtained very good performance results using multi-CPU multi-core CPUs and LAN/WAN based clusters. Now we are working on the porting and adaptation of ARTÌS/GAIA+ to the IBM Blue Gene/Q system and we plan to extend our work to support some public Cloud infrastructures (e.g. Amazon EC2). More in detail, the support of Cloud infrastructures will require the design, implementation and evaluation of a brand new set of features that are specific to the characteristics and idiosyncrasies of the Cloud (e.g. jitter, fault tolerance, security etc.).

In the HPCS 2001 tutorial [3] we have described why a new approach is necessary for building simulators that are able to fulfill the requirements described above. In [4] the authors, have demonstrated that the Amazon EC2 infrastructure can be used for running distributed simulations with acceptable results in terms of performance and cost. In our vision, the approach introduced in [4] is a good starting point for the development of new specifically tailored mechanisms that will be able to speed up the execution of PADS on public Clouds.

References

[1] ARTÌS: Advanced RTI System

[2] GAIA: Generic Adaptive Interaction Architecture

[3] Gabriele D'Angelo. Parallel and Distributed Simulation from Many Cores to the Public Cloud. Proceedings of the 2011 International Conference on High Performance Computing and Simulation (HPCS 2011).

[4] Kurt Vanmechelen, Silas De Munck, Jan Broeckhove: Conservative Distributed Discrete Event Simulation on Amazon EC2. CCGRID 2012.