REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems

RECIPE project provides:

– a hierarchical runtime resource management infrastructure optimizing energy efficiency and ensuring reliability for both time-critical and throughput-oriented computation;
– a predictive reliability methodology to support the enforcing of QoS guarantees in face of both transient and long-term hardware failures, including thermal, timing
and reliability models;
– and a set of integration layers allowing the resource manager to interact with both the application and the underlying deeply heterogeneous architecture, addressing them in a disaggregated way.

Quantitative goals for RECIPE include 25% increase in energy efficiency (performance/watt) with an 15% MTTF improvement due to proactive thermal management; energy-delay product improved up to 25%; 20% reduction of faulty executions.

The project goal is to assess its results against the following set of real-world use cases, addressing key application domains ranging from well-established HPC applications such as geophysical exploration and meteorology, through renewable energy sources, to emerging application domains such as biomedical machine learning and data analytics.

RECIPE project relies on a consortium composed of four leading academic partners (POLIMI,UPV,EPFL,CeRICT); two supercomputing centers, BSC and PSNC; a research hospital, CHUV, and an SME, IBTS, which provide effective exploitation avenues through industry-based use cases, including biomedicine, climate and renewable energy sources, geophysical exploration.

Start date: 2018-05-01
End date: 2021-04-30
Role: Partner
Origin: Foreign project
Funding: H2020