Aller au contenu principal
INRIA recrutement

Phd Position F - M Modelization Of Hpc Jobs And Resources To Minimize Energy Waste H/F INRIA

Grenoble - 38
CDD
Résumé de l'offre
  • 36 mois
  • Bac +5
  • Service public des collectivités territoriales

Les missions du poste

PhD Position F/M Modelization of HPC Jobs and Resources to Minimize Energy Waste
Le descriptif de l'offre ci-dessous est en Anglais
Type de contrat : CDD

Niveau de diplôme exigé : Bac +5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

The Centre Inria de l'Université de Grenoble groups together almost 600 people in 24 research teams and 9 research support departments.

Staff is present on three campuses in Grenoble, in close collaboration with other research and higher education institutions (Université Grenoble Alpes, CNRS, CEA, INRAE...), but also with key economic players in the area.

The Centre Inria de l'Université Grenoble Alpes is active in the fields of high-performance computing, verification and embedded systems, modeling of the environment at multiple levels, and data science and artificial intelligence. The center is a top-level scientific institute with an extensive network of international collaborations in Europe and the REST of the world.

Contexte et atouts du poste

co-advised by Raphaël Bleuse and Eric Rutten (Ctrl-A), LIG/INRIA, and

Franck Corset (LJK, ASAR)

Within the framework of the Taranis project in the PEPR Cloud.

Mission confiée

Soberness-in terms of electrical power-of Data Centers and High-Performance Computing (HPC) systems is becoming an important design issue, as the global energy consumption of Information Technologies is rising at consid- erable levels. Large-scale computing infrastructures are processing vaster amount of data or solving problems requiring vaster amount of computing power. The behavior of large scale infrastructures has become more variable and difficult to model, especially with respect to power consumption and application performance. Therefore, dealing with time variations and unpredictable disturbances demands to automate the management (i.e., configuration) of the infras- tructures. This automatic management can BE done by periodically monitoring the state of the system, and updating the configuration to activate relevant mechanisms.

This work takes root in the field of autonomic computing [6], and aims at designing efficient feedback loops to automatically manage the resources of a HPC (high performance computing) infrastructure. The use of feedback loops is widespread in various fields of engineering, but recent in the computer science field.

Principales activités

Resource harvesting

The Resource and Job Management System (RJMS) is a key component to operate a HPC system [5, 7]. Users submit jobs : a description of their computation, data, and resource requirements. With respect to the resources status reported by the resource manager, the scheduler decides which resources to allocate to a job and assigns a time slot for the job's execution.

The RJMS is however unable to fully exploit the resources in a HPC cluster : the presence of unused resources results from the limits of the scheduling of jobs. The allocation of resources to jobs, while respecting all constraints, leaves resources idle. Such inefficiency is sometimes referred as fragmentation. The loss of computing power resulting from the fragmentation represents an exploitable pool of resource.

CiGri [1] is a lightweight, scalable and fault-tolerant grid system that plugs into the RJMS. CiGri aims at minimizing the waste of computing resources to leverage the pool of unused resources. Yet, some computations (best-effort computations) can still lead to wasted resources if they are stopped before completion. In particular, by integrating information from the RJMS one could avoid to start best-effort computations if there is not enough time to execute them.

Modeling jobs and resources

The authors in [4] show that the jobs execution times are not necessary deterministic and can BE modeled by an Exponential, Weibull, log-Normal or Normal distributions. Moreover, the high variance of execution times and the diversity of the jobs, whether in terms of the nature of the data, the application domain or the size of the problem have to BE taking into account, by considering for instance a mixture of distributions (see [3]). Thus, we propose to apply some Machine Learning techniques, e.g., a clustering of all jobs submitted during the last decade, in order to take into account this heterogeneity. In a second step, we propose to take this information as a prior distribution in a Bayesian setting in order to improve the accuracy of the execution times estimations (see [8]). Furthermore, prior work mostly focuses on models that are job-centric and based on post-execution data [2] : they neglect to model the resources used for the computation.

In this work, we want to model the availability of the platform in order to insert the best-effort jobs in a frugal way. The design of models will have to consider that the allocation decisions are taken online, with partial information unveiling during the platform life-cycle.

References
- [1] Bruno Bzeznik and Ghislain Charrier, CiGri. lic : GPL-3.0-or-later. url : http : / / cigri. imag. fr/, vcs : https://github.com/oar-team/cigri.
- [2] Dror G. Feitelson, Dan Tsafrir, and David Krakov. Experience with using the Parallel Workloads Archive. In : J. Parallel Distributed Comput. 74.10 (Oct. 2014), pp. 2967-2982. doi : 10.1016/J.JPDC.2014.06.013.
[3] Sylvia Fruhwirth-Schnatter, Gilles Celeux, and Christian P Robert. Handbook of mixture analysis. CRC press, 2019.
[4] Ana Gainaru, Hongyang Sun, Guillaume Aupy, Yuankai Huo, Bennett A. Landman, and Padma Raghavan. On-the-fly scheduling versus reservation-based scheduling for unpredictable workflows. In : Int. J. High Perform. Comput. Appl. 33.6 (2019). doi : 10.1177/1094342019841681.
[5] Yiannis Georgiou. Contributions for Resource and Job Management in High Performance Computing. PhD thesis. LIG, Univ. Grenoble Alpes, France, Nov. 2010. url : https://tel.archives-ouvertes.fr/tel-01499598 (visited on 2023-10-11).
[6] Jeffrey O. Kephart and David M. Chess. The Vision of Autonomic Computing. In : IEEE Computer 36.1 (Jan. 2003), pp. 41-50. doi : 10.1109/MC.2003.1160055.
[7] Albert Reuther et al. Scalable system scheduling for HPC and big data. In : Journal of Parallel and Distributed Computing 111 (Jan. 2018), pp. 76-92. doi : 10.1016/j.jpdc.2017.06.009.
[8] Christian P Robert et al. The Bayesian choice : from decision-theoretic foundations to computational implemen- tation. Vol. 2. Springer.
- Compétences

The PhD candidate must have :
- A MSc degree in Computer science or Statistics.
- Skills in programming languages, software engineering
- Knowledge in the domains of HPC

Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave : 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage

Rémunération

Base of 2200 Euros gross / month

Bienvenue chez INRIA

A propos d'Inria

Inria est l'institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l'interface d'autres disciplines. L'institut fait appel à de nombreux talents dans plus d'une quarantaine de métiers différents. 900 personnels d'appui à la recherche et à l'innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.

Hellowork a estimé le salaire pour ce métier à Grenoble

Le recruteur n'a pas communiqué le salaire de cette offre mais Hellowork vous propose une estimation (fourchette variable selon l'expérience).

Estimation basée sur les données INSEE et les offres d’emploi similaires.

Estimation basse

36 600 € / an 3 050 € / mois 20,11 € / heure

Salaire brut estimé

43 800 € / an 3 650 € / mois 24,07 € / heure

Estimation haute

52 500 € / an 4 375 € / mois 28,85 € / heure

Cette information vous semble-t-elle utile ?

Merci pour votre retour !

Phd Position F - M Modelization Of Hpc Jobs And Resources To Minimize Energy Waste H/F
  • Grenoble - 38
  • CDD
Publiée le 16/06/2025 - Réf : c4f12bdd96def8ad7d53d45a1f4cca8f

Finalisez votre candidature

sur le site du recruteur

Créez votre compte pour postuler

sur le site du recruteur !

Ces offres pourraient aussi
vous intéresser

CEA recrutement
CEA recrutement
Voir l’offre
il y a 12 jours
Voir plus d'offres
Les sites
L'emploi
  • Offres d'emploi par métier
  • Offres d'emploi par ville
  • Offres d'emploi par entreprise
  • Offres d'emploi par mots clés
L'entreprise
  • Qui sommes-nous ?
  • On recrute
  • Accès client
Les apps
Application Android (nouvelle fenêtre) Application ios (nouvelle fenêtre)
Informations légales CGU Politique de confidentialité Gérer les traceurs Aide et contact
Nous suivre sur :