PArallel Graph Algorithms (PAGA)

University of Bologna. Department of Computer Science and Engineering (DISI)

Welcome to the PAGA homepage.

The goal of this project is to design, implement and evaluate parallel algorithms for large-scale graph analysis. In particular, we are interested in scalable approaches for computing centrality metrics on distributed memory architectures. Centrality metrics such as betweenness centrality, clustering coefficient, closeness centrality and others, are widely used in social network analysis.

The PAGA project is supported by an Italian Supercomputing Research Allocation (ISCRA) project HP10CR15I1 from 2012/10/02 to 2013/07/02. PAGA has been granted the use of FERMI, an IBM BlueGene/Q supercomputer installed at CINECA. In June 2012, FERMI ranked at the 7th position in the Top500 list.

Background

A social network can be analyzed by computing appropriate metrics on the underlying graph. Unfortunately, the size of real-world social network graphs makes the application of conventional, sequential graph algorithms impractical due to both space and time constraints. First, it is not possible to store the whole graph in main memory on a single workstation; then, complex metrics require a prohibitive amount of time to be computed on large graphs on a single workstation.

Computing centrality metrics in large social graphs is challenging and is subject of active research. However, most of the existing solutions rely on shared memory architectures, since graph algorithms proved to be particularly difficult to parallelize on distributed memory machines. Our focus on distributed memory systems is motivated by the fact that distributed memory supercomputers are currently more powerful than shared memory machines; furthermore, using MPI we will hopefully achieve portability portability across a wide range of hardware platforms, from supercomputers to clusters of workstations. Of course, performance will vary accordingly.

State of the art

We carried out a fairly complete review of the state of the art in a forthcoming paper: M. Lambertini, M. Magnani, M. Marzolla, D. Montesi, C. Paolino, Large-Scale Social Network Analysis, to appear in Large Scale Data Analytics, Aris Gkoulalas-Divanis and Abderrahim Labbi (Editors), Springer 2012

Expected Outcomes

Our primary goal is to understand if and how centrality metrics can be effectively computed on distributed memory architectures. A positive answer will be a significant contribution to the research community. Our secondary goal is to lay down the foundations for the development of a software package for social network analysis on distributed memory architectures based on MPI. This package should run both on general-purpose, inexpensive clusters of PCs, and also on high performance parallel machines. The use of MPI will increase the (potential) user base for our package, and consequently the impact of this project within the scientific community.