HAC SPECIS: High-performance Application and Computers, Studying PErformance and Correctness In Simulation

Table of Contents

The goal of the HAC SPECIS (High-performance Application and Computers: Studying PErformance and Correctness In Simulation) project is to answer methodological needs of HPC application and runtime developers and to allow to study real HPC systems both from the correctness and performance point of view. To this end, we gather experts from the HPC, formal verification and performance evaluation community.

Context

In the last decades, modern computer hardware and software both have become increasingly complex. Multi-core architectures comprising several accelerators (GPUs or the Intel Xeon Phi) and interconnected by high-speed networks have become mainstream in the field of High Performance Computing (HPC). Obtaining the maximum performance of such heterogeneous machines requires to break the traditional uniform programming paradigm. To scale, application developers have to make their code as adaptive as possible, and to relax synchronizations as much as possible. They also have to resort to sophisticated and dynamic data management, load balancing, and scheduling strategies. This evolution has several consequences:

  • First, the increasing complexity and the relaxation of synchronizations are even more error-prone. The resulting bugs may almost never occur at small scale, but systematically occur at large scale and in a non deterministic way, which makes them particularly difficult to identify and eliminate.
  • Second, the dozen of software stacks and their interactions have become so complex that predicting the performance (both in term of time, resource usage and energy) of the system as a whole is extremely difficult. Understanding and configuring such systems has therefore become a key challenge.

We believe these two challenges related to correctness and performance can be answered by gathering the skills from experts in formal verification, performance evaluation, and high performance computing. The goal of the HAC SPECIS Inria Project Laboratory (IPL) is to address the methodological requirements raised by the recent evolution of HPC architectures, by allowing application and runtime developers to study such systems both from the correctness and performance points of view.

All the resulting research developments will be integrated in the open source SimGrid framework so that they can benefit the community and beyond as quickly as possible.

Members and Inria Teams

Arnaud Legrand (POLARIS) is the leader of the HAC SPECIS project.

Rhône Alpes
Bretagne Atlantique
Sud Ouest
Île de France
Grand Est

PhD, PostDoc, and Engineers who worked within the scope of HAC SPECIS

Underlined names have been directly funded by Inria through the IPL.

2015
 
  • Christian Heinrich (Grenoble, PhD CORDI)
  • Luka Stanisic (Bordeaux, PostDoc CORDI, now Max Planck Computing and Data Facility, Munich)
2016
 
  • The Anh Pham (Rennes, PhD IPL)
  • Ian Masliah (Bordeaux, PostDoc, now LIP6)
2017
 
  • Toufik Boubehziz (Rennes, Ing. ADT)
  • Tom Cornebize (Grenoble, PhD MENRT)
  • Dorra Boughzala (Lyon, PhD IPL)
  • Millian Poquet (Rennes, PostDoc CORDI)
2018
 
  • Augustin Degomme (Grenoble, CEA)
  • Doctorant US Gene Cooperman
  • Idriss Daoudi (Bordeaux, PhD)
  • Yann Duplouy (Nancy, PostDoc)
2019
  • Kameswar Rao Vaddina (Rennes, PostDoc)
  • Lucas Nesi (Grenoble/Porto Alegre, PhD)

Related Links

Meetings

Plenary Meetings

  • Kickoff: Jun. 23-24, 2016 @ Rennes
  • Plenary: Apr. 10-13, 2017 @ Bordeaux
  • Plenary: Sep. 16-19, 2017 @ Lyon
  • Plenary: May 28-31, 2018 @ Paris (+Mid-term review)
  • Plenary: July 8-12, 2019 @ St Martin d'Uriage
  • Plenary: Oct. 12, 2020 Virtual
  • Plenary: Oct. 23, 2020 Virtual
  • Plenary: Nov. 09, 2020 Virtual (Review)

SimGrid User Days

  • Lyon (June 2016)
  • Rennes (November 2017)

Point-to-point visits (> 4 days)

  • L. Stanisic Bordeaux → Grenoble (2016)
  • C. Heinrich: Grenoble → Rennes (3 weeks 2017)
  • A. Guermouche: Paris → Rennes (04/2017)
  • I. Masliah: Bordeaux → Grenoble (05/2017)
  • M. Quinson: Rennes → Nancy (05/2017)
  • E. Saillard: Bordeaux → Rennes (04/2018)
  • A.-C. Orgerie: Rennes → Lyon and Grenoble (2017-2018)
  • S. Thibault: Bordeaux → Lyon (delegation 2018)
  • A. Faure: Grenoble → Rennes (M. Quinson and M. Poquet, on RSG and simwrap) (2019)
  • Y. Duplouy: Nancy → Rennes (2019)
  • P.A. Rouby: Lyon → Nancy (2020)

Software Releases

SimGrid and StarPU are the main software developed within the IPL. Their compatibility is nightly checked on Inria Continuous Integration platform.

SimGrid:

  • SimGrid (3.14) Dec. 2016
  • SimGrid (3.15) Mar. 2017
  • SimGrid (3.16) June 2017
  • SimGrid (3.17) Oct. 2017
  • SimGrid (3.18) Dec. 2017
  • SimGrid (3.19) Mar. 2018
  • SimGrid (3.20) June 2018
  • SimGrid (3.21) Oct. 2018
  • SimGrid (3.22) Apr. 2019
  • SimGrid (3.23) June 2019
  • SimGrid (3.24) Oct. 2019
  • SimGrid (3.25) Feb. 2020

StarPU:

  • StarPU (1.2.0) Aug. 2016
  • StarPU (1.2.1) Mar. 2017
  • StarPU (1.2.2) May 2017
  • StarPU (1.2.3) Nov. 2017
  • StarPU (1.2.4) Apr. 2018
  • StarPU (1.2.5) Aug. 2018
  • StarPU (1.2.6) Sep. 2018
  • StarPU (1.3.0) Mar. 2019
  • StarPU (1.3.1) Apr. 2019
  • StarPU (1.3.2) June 2019
  • StarPU (1.3.3) Oct. 2019
  • StarPU (1.2.8) Feb. 2019
  • StarPU (1.3.3) Oct. 2019
  • StarPU (1.2.9) Jan. 2020
  • StarPU (1.3.4) June 2020
  • StarPU (1.2.10) June 2020
  • StarPU (1.3.5) Aug. 2020
  • StarPU (1.3.7) Oct. 2020

Publications

[1] Vinicius Garcia Pinto, Luka Stanisic, Arnaud Legrand, Lucas Mello Schnorr, Samuel Thibault, and Vincent Danjean. Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach. In 3rd Workshop on Visual Performance Analysis (VPA), Salt Lake City, United States, November 2016. Held in conjunction with SC16.. [ http | .pdf ]
[2] Augustin Degomme, Arnaud Legrand, Georges Markomanolis, Martin Quinson, Mark Lee Stillwell, and Frédéric Suter. Simulating MPI applications: the SMPI approach. IEEE Transactions on Parallel and Distributed Systems, page 14, February 2017. . [ DOI | http | .pdf ]
Keywords: Performance prediction and extrapolation ; Simulation ; MPI runtime and applications
[3] Franz C. Heinrich, Tom Cornebize, Augustin Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, Sascha Hunold, Anne-Cécile Orgerie, and Martin Quinson. Predicting the Energy Consumption of MPI Applications at Scale Using a Single Node. In Cluster 2017, Hawaii, United States, September 2017. IEEE. . [ http | .pdf ]
Keywords: simulation ; HPC ; energy ; platform modeling
[4] Rafael Keller Tesser, Lucas Mello Schnorr, Arnaud Legrand, Fabrice Dupros, and Philippe O A Navaux. Using Simulation to Evaluate and Tune the Performance of Dynamic Load Balancing of an Over-decomposed Geophysics Application. In Euro-Par 2017: 23rd International European Conference on Parallel and Distributed Computing, page 15, Santiago de Compostela, Spain, August 2017. [ http | .pdf ]
Keywords: Load balancing and over-decomposition ; Performance prediction and extrapolation ; Simulation ; Geophysics FDM application
[5] Rafael Keller Tesser, Lucas Mello Schnorr, Arnaud Legrand, Christian Heinrich, Fabrice Dupros, and Philippe Olivier Alexandre Navaux. Performance Modeling of a Geophysics Application to Accelerate the Tuning of Over-decomposition Parameters through Simulation. Concurrency and Computation: Practice and Experience, pages 1--21, 2018. [ DOI | http | .pdf ]
Keywords: Computer System Simulation ; Geophysics FDM application ; Performance prediction ; High-Performance Computing ; Load balancing and over-decomposition
[6] Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Martin Khannouz, and Luka Stanisic. Task-based fast multipole method for clusters of multicore processors. Research Report RR-8970, Inria Bordeaux Sud-Ouest, March 2017. . [ http | .pdf ]
Keywords: multicore processor ; high performance computing (HPC) ; fast multipole method ; hybrid parallelization ; runtime system ; task-based programming ; cluster ; FMM ; méthode multipôles rapide ; Calcul haute performance ; architecture multicœur ; moteur d'exécution ; parallélisation hybride ; programmation à base de tâches ; MPI ; OpenMP
[7] Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Luka Stanisic, and Samuel Thibault. Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method. Research Report RR-9036, INRIA Bordeaux, February 2017. . [ http | .pdf ]
Keywords: Mathematical Software ; Modeling and simulation ; Parallel computing methodologies ; fast multipole method ; runtime system ; task-based programming
[8] The Anh Pham, Thierry Jéron, and Martin Quinson. Verifying MPI Applications with SimGridMC. In Correctness 2017 - First International Workshop on Software Correctness for HPC Applications, Denver, United States, November 2017. [ DOI | http | .pdf ]
Keywords: Model checking ; Software verification ; Dynamic analysis ; Formal software verification ; Ultra-large-scale systems ; Parallel algorithms
[9] Vinicius Garcia Pinto, Lucas Mello Schnorr, Luka Stanisic, Arnaud Legrand, Samuel Thibault, and Vincent Danjean. A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters. Concurrency and Computation: Practice and Experience, 30(18):1--31, April 2018. . [ DOI | http | .pdf ]
Keywords: Heterogeneous platforms ; Cholesky ; High-Performance Computing ; Trace Visualization ; Task-based applications
[10] Samuel Thibault. On Runtime Systems for Task-based Programming on Heterogeneous Platforms. Habilitation à diriger des recherches, Université de Bordeaux, December 2018. [ http | .pdf ]
Keywords: Runtime Systems ; Task graphs ; Task graph scheduling ; Distributed Computing ;
[11] Emmanuel Agullo, Luc Giraud, Stéphane Lanteri, Gilles Marait, Anne-Cécile Orgerie, and Louis Poirel. Energy analysis of a solver stack for frequency-domain electromagnetics. Research Report RR-9240, Inria Bordeaux Sud-Ouest, December 2018. . [ http | .pdf ]
Keywords: Energy consumption ; HPC ; Consommation énergétique
[12] Henri Casanova, Suraj Pandey, James Oeth, Ryan Tanaka, Frédéric Suter, and Rafael Ferreira Da Silva. WRENCH: A Framework for Simulating Workflow Management Systems. In WORKS 2018 - 13th Workshop on Workflows in Support of Large-Scale Science, pages 1--12, Dallas, United States, November 2018. [ http | .pdf ]
Keywords: Scientific Workflows ; Workflow Management Systems ; Simulation ; Distributed Computing
[13] Henri Casanova, Arnaud Legrand, Martin Quinson, and Frédéric Suter. SMPI Courseware: Teaching Distributed-Memory Computing with MPI in Simulation. In EduHPC-18 - Workshop on Education for High-Performance Computing, pages 1--10, Dallas, United States, November 2018. . [ http | .pdf ]
Keywords: High Performance Computing Education ; Parallel Computing Education ; Message Passing Interface ; Simulation
[14] Issam Raïs, Anne-Cécile Orgerie, Martin Quinson, and Laurent Lefèvre. Quantifying the Impact of Shutdown Techniques for Energy-Efficient Data Centers. Concurrency and Computation: Practice and Experience, 30(17):1--13, 2018. . [ DOI | http | .pdf ]
Keywords: shutdown techniques ; sleep modes ; Energy efficiency ; energy-aware hardware ; data centers
[15] Anchen Chai, Sorina Camarasu-Pop, Tristan Glatard, Hugues Benoit-Cattin, and Frederic Suter. Evaluation through Realistic Simulations of File Replication Strategies for Large Heterogeneous Distributed Systems. In Europar 2018 - 24th International European Conference on Parallel and Distributed Computing ; Workshop HeteroPar 2018, Lecture Notes in Computer Science (LNCS), page in press, Turin, Italy, August 2018. [ http ]
Keywords: file replication ; platform model ; realistic simulation ; evaluation
[16] Marion Guthmuller, Gabriel Corona, and Martin Quinson. System-level state equality detection for the formal dynamic verification of legacy distributed applications. Journal of Logical and Algebraic Methods in Programming, 96:1 -- 11, April 2018. . [ DOI | http | .pdf ]
[17] Pierre Huchant, Emmanuelle Saillard, Denis Barthou, Hugo Brunie, and Patrick Carribault. PARCOACH Extension for a Full-Interprocedural Collectives Verification. In Second International Workshop on Software Correctness for HPC Applications, Dallas, United States, November 2018. [ http | .pdf ]
Keywords: MPI ; OpenMP ; Collectives ; Static analysis ; Verification
[18] Emmanuelle Saillard, Koushik Sen, Wim Lavrijsen, and Costin Iancu. Maximizing Communication Overlap with Dynamic Program Analysis. In International Conference on High Performance Computing in Asia-Pacific Region, Tokyo, Japan, January 2018. [ http | .pdf ]
Keywords: Dynamic Analysis ; Optimization ; One-sided communication ; UPC
[19] The Anh Pham, Thierry Jéron, and Martin Quinson. Unfolding-based Dynamic Partial Order Reduction of Asynchronous Distributed Programs. In Jorge A. Pérez and Nobuko Yoshida, editors, FORTE 2019 - 39th International Conference on Formal Techniques for Distributed Objects, Components, and Systems, volume LNCS-11535 of Formal Techniques for Distributed Objects, Components, and Systems, pages 224--241, Copenhagen, Denmark, 2019. Springer International Publishing. Part 1: Full Papers. [ DOI | http | .pdf ]
Keywords: Asynchronous ; Distributed program ; Partial order ; Unfolding
[20] The Anh Pham. Efficient state-space exploration for asynchronous distributed programs : adapting Unfolding-based Dynamic Partial Order Reduction to MPI programs. Phd thesis, Université de Rennes, December 2019.
[21] Pierre Huchant, Emmanuelle Saillard, Denis Barthou, and Patrick Carribault. Multi-Valued Expression Analysis for Collective Checking. In EuroPar, Göttingen, Germany, August 2019. [ http | .pdf ]
[22] Franz Heinrich. Modeling, Prediction and Optimization of Energy Consumption of MPI Applications using SimGrid. Theses, Université Grenoble Alpes, May 2019. [ http | http ]
[23] Ehsan Ahvar, Anne-Cécile Orgerie, and Adrien Lebre. Estimating Energy Consumption of Cloud, Fog and Edge Computing Infrastructures. IEEE Transactions on Sustainable Computing, pages 1--12, April 2019. [ DOI | http | .pdf ]
Keywords: Peer-to-peer ; distributed Clouds ; energy consumption ; Cloud computing ; Edge computing ; Fog computing
[24] Loic Guegan and Anne-Cécile Orgerie. Estimating the end-to-end energy consumption of low-bandwidth IoT applications for WiFi devices. In CloudCom 2019 - 11th IEEE International Conference on Cloud Computing Technology and Science, Sydney, Australia, December 2019. IEEE. [ http | .pdf ]
Keywords: IoT devices ; energy consumption ; clouds ; end-to-end model
[25] Rafael Ferreira Da Silva, Anne-Cécile Orgerie, Henri Casanova, Ryan Tanaka, Ewa Deelman, and Frédéric Suter. Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows. In ICCS 2019 - International Conference on Computational Science, ICCS 2019 - International Conference on Computational Science, pages 138--152, Faro, Portugal, June 2019. Springer. . [ DOI | http | .pdf ]
[26] Loic Guegan, Betsegaw Lemma Amersho, Anne-Cécile Orgerie, and Martin Quinson. A Large-Scale Wired Network Energy Model for Flow-Level Simulations. In AINA 2019 - 33rd International Conference on Advanced Information Networking and Applications, volume 926 of Advances in Intelligent Systems and Computing, pages 1047--1058, Matsue, Japan, March 2019. Springer. [ DOI | http | .pdf ]
[27] Emmanuel Agullo, Luc Giraud, Stephane Lanteri, Gilles Marait, Anne-Cécile Orgerie, and Louis Poirel. Energy Analysis of a Solver Stack for Frequency-Domain Electromagnetics. In PDP 2019 - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pages 385--391, Pavia, Italy, February 2019. IEEE. . [ DOI | http | .pdf ]
[28] Amina Guermouche and Anne-Cécile Orgerie. Experimental analysis of vectorized instructions impact on energy and power consumption under thermal design power constraints. working paper or preprint, June 2019. [ http | .pdf ]
Keywords: TDP ; SIMD instructions ; Power consumption ; Memory ; Energy efficiency
[29] Tom Cornebize and Arnaud Legrand. DGEMM performance is data-dependent. Research Report RR-9310, Université Grenoble Alpes ; Inria ; CNRS, December 2019. [ http | .pdf ]
Keywords: Performance ; BLAS ; DVFS ; DGEMM
[30] Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic, and Lucas Mello Schnorr. Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms. In CCGrid 2019 - 19thAnnual IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 142--151, Larnaca, Cyprus, May 2019. IEEE. . [ DOI | http | .pdf ]
[31] Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Arnaud Legrand, Ian Masliah, and Luka Stanisic. Simulation of a Sparse Direct Solver on Heterogeneous Systems using Starpu and Simgrid. In CSE 2019 - SIAM Conference on Computational Science and Engineering, Spokane, United States, February 2019. SIAM. . [ http ]
[32] Tom Cornebize, Arnaud Legrand, and Franz C Heinrich. Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study. In 2019 IEEE International Conference on Cluster Computing (CLUSTER), 2019 IEEE International Conference on Cluster Computing (CLUSTER), Albuquerque, United States, September 2019. [ DOI | http | .pdf ]
[33] Arnaud Legrand, Denis Trystram, and Salah Zrigui. Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning? In IPDPS 2019 - 33rd IEEE International Parallel & Distributed Processing Symposium, pages 686--695, Rio de Janeiro, Brazil, May 2019. IEEE. [ DOI | http | .pdf ]
[34] Salah Zrigui, Raphael Y De Camargo, Denis Trystram, and Arnaud Legrand. Improving the Performance of Batch Schedulers Using Online Job Size Classification. working paper or preprint, October 2019. [ http | .pdf ]
[35] Rafael Ferreira Da Silva, Henri Casanova, Ryan Tanaka, and Frédéric Suter. Bridging Concepts and Practice in eScience via Simulation-driven Engineering. In BC2DC 2019 - Workshop on Bridging from Concepts to Data and Computation for eScience, pages 1--6, San Diego, CA, United States, September 2019. [ http | .pdf ]
Keywords: Reproducible Research ; Distributed Computing ; CyberInfrastrucutre Development ; Simulation Accuracy
[36] Idriss Daoudi, Philippe Virouleau, Thierry Gautier, Samuel Thibault, and Olivier Aumage. sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects. In IWOMP 2020 - 16th International Workshop on OpenMP, volume 12295 of LNCS, Austin / Virtual, United States, September 2020. Springer. . [ DOI | http | .pdf ]
Keywords: OpenMP tasks ; NUMA architecture ; Performance modeling ; Simulation
[37] Dorra Boughzala, Laurent Lefèvre, and Anne-Cécile Orgerie. Predicting the energy consumption of CUDA kernels using SimGrid. In SBAC-PAD 2020 - 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, pages 1--8, Porto, Portugal, September 2020. IEEE. . [ http | .pdf ]
Keywords: GPGPU computing ; CUDA kernels ; Energy modeling ; Simulation
[38] Lucas Leandro Nesi, Lucas Mello Schnorr, and Arnaud Legrand. Communication-Aware Load Balancing of the LU Factorization over Heterogeneous Clusters. In IEEE International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, France, December 2020. [ http | .pdf ]
Keywords: Data Partitioning ; LU Factorization ; Load Bal-ancing ; Task-Based Applications ; Heterogeneous Clusters
[39] Marie Duflot and Yann Duplouy. Statistical Model Checking of Distributed Programs within SimGrid. In SIMULTECH 2020 - 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications, Lieusaint, France, July 2020. [ http | .pdf ]
Keywords: Simulation ; SimGrid ; Statistical Model Checking ; Distributed Programs ; Stochastic Distributed Systems
[40] Van Man Nguyen, Emmanuelle Saillard, Julien Jaeger, Denis Barthou, and Patrick Carribault. PARCOACH Extension for a Full-Interprocedural Collectives Verification. In Fourth International Workshop on Software Correctness for HPC Applications, 2020.
Keywords: MPI ; OpenMP ; Collectives ; Static analysis ; Verification
[41] Rafael Ferreira Da Silva, Henri Casanova, Anne-Cécile Orgerie, Ryan Tanaka, Ewa Deelman, and Frédéric Suter. Characterizing, Modeling, and Accurately Simulating Power and Energy Consumption of I/O-intensive Scientific Workflows. Journal of computational science, 44:101157, June 2020. . [ DOI | http | .pdf ]
Keywords: Workflow profiling ; Workflow scheduling ; Energy-aware computing ; Scientific workflows

Created: 2020-11-08 dim. 10:52

Validate