HAC SPECIS: High-performance Application and Computers, Studying PErformance and Correctness In Simulation

Table of Contents

The goal of the HAC SPECIS (High-performance Application and Computers: Studying PErformance and Correctness In Simulation) project is to answer methodological needs of HPC application and runtime developers and to allow to study real HPC systems both from the correctness and performance point of view. To this end, we gather experts from the HPC, formal verification and performance evaluation community.

Context

In the last decades, modern computer hardware and software both have become increasingly complex. Multi-core architectures comprising several accelerators (GPUs or the Intel Xeon Phi) and interconnected by high-speed networks have become mainstream in the field of High Performance Computing (HPC). Obtaining the maximum performance of such heterogeneous machines requires to break the traditional uniform programming paradigm. To scale, application developers have to make their code as adaptive as possible, and to relax synchronizations as much as possible. They also have to resort to sophisticated and dynamic data management, load balancing, and scheduling strategies. This evolution has several consequences:

  • First, the increasing complexity and the relaxation of synchronizations are even more error-prone. The resulting bugs may almost never occur at small scale, but systematically occur at large scale and in a non deterministic way, which makes them particularly difficult to identify and eliminate.
  • Second, the dozen of software stacks and their interactions have become so complex that predicting the performance (both in term of time, resource usage and energy) of the system as a whole is extremely difficult. Understanding and configuring such systems has therefore become a key challenge.

We believe these two challenges related to correctness and performance can be answered by gathering the skills from experts in formal verification, performance evaluation, and high performance computing. The goal of the HAC SPECIS Inria Project Laboratory (IPL) is to address the methodological requirements raised by the recent evolution of HPC architectures, by allowing application and runtime developers to study such systems both from the correctness and performance points of view.

All the resulting research developments will be integrated in the open source SimGrid framework so that they can benefit the community and beyond as quickly as possible.

Members and Inria Teams

Arnaud Legrand (POLARIS) is the leader of the HAC SPECIS project.

Rhône Alpes
Bretagne Atlantique
Sud Ouest
Île de France
Grand Est

PhD, PostDoc, and Engineers who worked within the scope of HAC SPECIS

2015
 
  • Christian Heinrich (Grenoble, PhD CORDI)
  • Luka Stanisic (Bordeaux, PostDoc CORDI, now Max Planck Computing and Data Facility, Munich)
2016
 
  • The Anh Pham (Rennes, PhD IPL)
  • Ian Masliah (Bordeaux, PostDoc, now LIP6)
2017
 
  • Toufik Boubehziz (Rennes, Ing. ADT)
  • Tom Cornebize (Grenoble, PhD MENRT)
  • Dorra Boughzala (Lyon, PhD IPL)
  • Millian Poquet (Rennes, PostDoc CORDI)
2018
 
  • Augustin Degomme (Grenoble, CEA)
  • Doctorant US Gene Cooperman

Related Links

Meetings

Plenary Meetings

  • Kickoff: Jun. 23-24, 2016 @ Rennes
  • Plenary: Apr. 10-13, 2017 @ Bordeaux
  • Plenary: Sep. 16-19, 2017 @ Lyon
  • Plenary: May 28-31, 2018 @ Paris (+Mid-term review)

Tripartite Meetings (3-4 days)

  • Grenoble, Rennes @ Lyon (July 2017)
  • Lyon, Grenoble @ Rennes (July 2018)

SimGrid User Days

  • Lyon (June 2016)
  • Rennes (November 2017)

Point-to-point visits (> 4 days)

  • L. Stanisic Bordeaux → Grenoble (2016)
  • C. Heinrich: Grenoble → Rennes (3 weeks 2017)
  • A. Guermouche: Paris → Rennes (04/2017)
  • I. Masliah: Bordeaux → Grenoble (05/2017)
  • M. Quinson: Rennes → Nancy (05/2017)
  • E. Saillard: Bordeaux → Rennes (04/2018)
  • A.-C. Orgerie: Rennes → Lyon and Grenoble (2017-2018)
  • Samuel: Bordeaux → Lyon (delegation 2018)

Software Releases

SimGrid and StarPU are the main software developed within the IPL. Their compatibility is nightly checked on Inria Continuous Integration platform.

SimGrid:

  • SimGrid (3.14) 12/2016
  • SimGrid (3.15) 03/2017
  • SimGrid (3.16) 06/2017
  • SimGrid (3.17) 10/2017
  • SimGrid (3.18) 12/2017
  • SimGrid (3.19) 03/2018

StarPU:

  • StarPU (1.2.0) en 08/2016
  • StarPU (1.2.1) en 03/2017
  • StarPU (1.2.2) en 05/2017
  • StarPU (1.2.3) en 11/2017
  • StarPU (1.2.4) en 04/2018

Publications

[1] Augustin Degomme, Arnaud Legrand, Georges Markomanolis, Martin Quinson, Mark Lee Stillwell, and Frédéric Suter. Simulating MPI applications: the SMPI approach. IEEE Transactions on Parallel and Distributed Systems, page 14, February 2017. [ DOI | http | .pdf ]
Keywords: Performance prediction and extrapolation ; Simulation ; MPI runtime and applications
[2] Franz C. Heinrich, Tom Cornebize, Augustin Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, Sascha Hunold, Anne-Cécile Orgerie, and Martin Quinson. Predicting the Energy Consumption of MPI Applications at Scale Using a Single Node. In Cluster 2017, Hawaii, United States, September 2017. IEEE. [ http | .pdf ]
Keywords: simulation ; HPC ; energy ; platform modeling
[3] Rafael Keller Tesser, Lucas Mello Schnorr, Arnaud Legrand, Fabrice Dupros, and Philippe O A Navaux. Using Simulation to Evaluate and Tune the Performance of Dynamic Load Balancing of an Over-decomposed Geophysics Application. In Euro-Par 2017: 23rd International European Conference on Parallel and Distributed Computing, page 15, Santiago de Compostela, Spain, August 2017. An extended version is under review in the Concurrency and Computation: Practice and Experience journal. [ http | .pdf ]
Keywords: Load balancing and over-decomposition ; Performance prediction and extrapolation ; Simulation ; Geophysics FDM application
[4] Tom Cornebize, Franz C Heinrich, Arnaud Legrand, and Jérôme Vienne. Emulating High Performance Linpack on a Commodity Server at the Scale of a Supercomputer. Submitted at Grid 2018, December 2017. [ http | .pdf ]
[5] Vinicius Garcia Pinto, Lucas Mello Schnorr, Luka Stanisic, Arnaud Legrand, Samuel Thibault, and Vincent Danjean. A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters. Under review (minor revision) in the Concurrency and Computation: Practice and Experience journal, October 2017. [ http | .pdf ]
Keywords: Heterogeneous platforms ; Cholesky ; High-Performance Computing ; Trace Visualization ; Task-based applications
[6] Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault, and Vincent Danjean. Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach. In 3rd Workshop on Visual Performance Analysis (VPA), Salt Lake City, United States, November 2016. Held in conjunction with SC16. [ http | .pdf ]
[7] Luka Stanisic, Samuel Thibault, Arnaud Legrand, Brice Videau, and Jean-Fran cois Méhaut. Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures. Concurrency and Computation: Practice and Experience, page 16, May 2015. [ DOI | http | .pdf ]
Keywords: Starpu-simgrid ; HPC ; simgrid ; simulations ; runtimes
[8] Luka Stanisic, Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Arnaud Legrand, Florent Lopez, and Brice Videau. Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers. In The 21st IEEE International Conference on Parallel and Distributed Systems, Melbourne, Australia, December 2015. [ http | .pdf ]
Keywords: Sparse Linear Algebra ; Mumps ; Starpu-simgrid ; HPC ; Simgrid ; Runtime
[9] Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Martin Khannouz, and Luka Stanisic. Task-based fast multipole method for clusters of multicore processors. Research Report RR-8970, Inria Bordeaux Sud-Ouest, March 2017. [ http | .pdf ]
Keywords: multicore processor ; high performance computing (HPC) ; fast multipole method ; hybrid parallelization ; runtime system ; task-based programming ; cluster ; FMM ; méthode multipôles rapide ; Calcul haute performance ; architecture multicœur ; moteur d'exécution ; parallélisation hybride ; programmation à base de tâches ; MPI ; OpenMP
[10] The Anh Pham, Thierry Jéron, and Martin Quinson. Verifying MPI Applications with SimGridMC. In Correctness 2017 - First International Workshop on Software Correctness for HPC Applications, Denver, United States, November 2017. [ DOI | http | .pdf ]
Keywords: Software and its engineering -> Model checking ; Software verification ; Dynamic analysis ; Formal software verification ; Ultra-large-scale systems ; Theory of computation ; Parallel algorithms
[11] Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Luka Stanisic, and Samuel Thibault. Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method. Research Report RR-9036, INRIA Bordeaux, February 2017. [ http | .pdf ]
Keywords: Mathematical Software ; Modeling and simulation ; Parallel computing methodologies ; fast multipole method ; runtime system ; task-based programming

This file was generated by bibtex2html 1.98.

Created: 2018-05-31 Jeu 18:43

Validate