Aller au contenu principal
INRIA recrutement

Phd Position F - M Cost And Performance-Efficient Caching For Massively Distributed Systems H/F INRIA

  • Rennes - 35
  • CDD
  • 36 mois
  • Bac +5
  • Service public des collectivités territoriales

Détail du poste

PhD Position F/M Cost and Performance-Efficient Caching for Massively Distributed Systems
Le descriptif de l'offre ci-dessous est en Anglais
Type de contrat : CDD

Niveau de diplôme exigé : Bac +5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.

Contexte et atouts du poste

Financial and working environment.

This PhD will be in the context of IPCEI-CIS (Important Project of Common European Interest - Next Generation Cloud Infrastructure and Services) DXP (Data Exchange Platform) project involving Amadeus and three Inria research teams (COAST, CEDAR and MAGELLAN). This project aims to design and develop an open-source management solution for a federated and distributed data exchange platform (DXP), operating in an open, scalable, and massively distributed environment (cloud-edge continuum).

The PhD student will be recruited and hosted at the Inria Center at Rennes University; and the work will be carried out within the MAGELLAN team in collaboration with other partners.

The PhD student will be supervised by:

- Shadi Ibrahim, MAGELLAN team in Rennes
- Cedric Tedeschi, MAGELLAN team in Rennes

Mission confiée

Context

The ever-growing number of services and Internet of Things (IoT) devices has resulted in data being distributed across different locations (regions and countries). Additionally, data exhibits different usage patterns, including cold data (written once and never read), stream data (produced once and consumed by many), and hot data (written once and consumed by many). Furthermore, these data types have different performance and dependability requirements (e.g., low latency for data streams).

Data caching is a widely used technique that improves application performance by storing data on high-speed devices close to end users. Most research on data caching has focused on the benefits of different data placement strategies (i.e., which data to place in the cache), data movement, cache partitioning, cache eviction [1, 2, 3, 4, 5, 6, 7, 8], and on realizing cost-efficient data redundancy techniques in caching systems [9]. However, few efforts have studied data management when caches are distributed across different platforms (Edge-to-Cloud), utilize heterogeneous storage devices (in terms of performance and cost), and serve multiple, diverse applications, including traditional data services, serverless workflows and data streaming.

References:

[1] Asit Dan and Don Towsley. 1990. An Approximate Analysis of the LRU and FIFO Buffer Replacement Schemes. SIGMETRICS Perform. Eval. Rev. 18, 1 (apr 1990), 143-152. https://doi.org/10.1145/98460.98525

[2] Marek Chrobak and John Noga. 1999. LRU is better than FIFO. Algorithmica 23 (02 1999), 180-185. https://doi.org/10.1007/PL00009255

[3] Blankstein, Aaron, Siddhartha Sen, and Michael J. Freedman. Hyperbolic caching: Flexible caching for web applications. 2017 USENIX Annual Technical Conference (USENIX ATC 17). 2017.

[4] Cristian Ungureanu, Biplob Debnath, Stephen Rago, and Akshat Aranya. 2013. TBF: A memory-efficient replacement policy for flash- based caches. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 1117-1128. https://doi.org/10.1109/ICDE.2013.6544902

[5] Orcun Yildiz, Amelie Chi Zhou, Shadi Ibrahim. 2018. Improving the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems with Eley. Future Generation Computer Systems, Volume 86, 2018, Pages 308-318, ISSN 0167-739X,.

[6] G. Aupy, O. Beaumont and L. Eyraud-Dubois, "Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention," 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 2019,

[7] ZHANG, Yazhuo, YANG, Juncheng, YUE, Yao, et al. {SIEVE} is simpler than {LRU}: an efficient {Turn-Key} eviction algorithm for web caches. In : 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 2024. p. 1229-1246.

[8] Juncheng Yang, Ziming Mao, Yao Yue, and K. V. Rashmi. GL-Cache: Group-level learning for efficient and high-performance caching. FAST'23, pages 115-134, 2023.

[9] RASHMI, K. V., CHOWDHURY, Mosharaf, KOSAIAN, Jack, et al.{EC-Cache}:{Load-Balanced},{Low-Latency} cluster caching with online erasure coding. In : 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 2016. p. 401-417.

Principales activités

The goal is to design cost- and performance-efficient distributed smart caching middleware that facilitates data exchange within and across data providers (and producers) and consumers (users) while considering the temperature of the data, its frequency, and the heterogeneity and dynamics of the infrastructure. Specifically, we aim to address these research questions:

- How can we seamlessly aggregate the caches of many distributed, heterogeneous machines?
- Where to place data across different sites/organizations (across the IoT to cloud continuum).
- Which data should be cached and for how long, how to resize the caches between different applications/users, when to empty caches, etc
- How to exploit data caches for data streams and how to efficiently share caches with hot data.
- In addition, it is important to study the right number of replicas to meet the users demands.

Research Methodology: This research addresses the challenges of data management in distributed, heterogeneous caches by designing and implementing novel middleware, models, algorithms, and a framework to answer the above questions. All solutions will be validated through simulations or on a real distributed infrastructure, such as Grid'5000 and Amazon Web Services.

Compétences

- An excellent Master degree in computer science or equivalent
- Strong knowledge of distributed systems
- Knowledge of storage and distributed file systems
- Ability to conduct experimental systems research
- Strong programming skills (C/C++, Python)
- Working experience in the areas of Big Data management, Cloud Computing, and Data Analytics are advantageous
- Very good communication skills in oral and written English

Avantages

- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage

Rémunération

monthly gross salary 2200 euros

A propos d'Inria

Inria est l'institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l'interface d'autres disciplines. L'institut fait appel à de nombreux talents dans plus d'une quarantaine de métiers différents. 900 personnels d'appui à la recherche et à l'innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.

Publiée le 17/09/2025 - Réf : 35a829816138f0556ae82014d9cd19fe

Phd Position F - M Cost And Performance-Efficient Caching For Massively Distributed Systems H/F

INRIA
  • Rennes - 35
  • CDD
Publiée le 17/09/2025 - Réf : 35a829816138f0556ae82014d9cd19fe

Finalisez votre candidature

sur le site du recruteur

Créez votre compte pour postuler

sur le site du recruteur !

Ces offres pourraient aussi
vous intéresser

APRIL recrutement
APRIL recrutement
Voir l’offre
il y a 19 jours
Okwind recrutement
Okwind recrutement
Torcé - 35
CDI
Télétravail partiel
Voir l’offre
il y a 8 jours
BPCE Solutions informatiques recrutement
BPCE Solutions informatiques recrutement
Saint-Jacques-de-la-Lande - 35
CDI
Télétravail partiel
Voir l’offre
il y a 15 jours
Voir plus d'offres
Les sites
L'emploi
  • Offres d'emploi par métier
  • Offres d'emploi par ville
  • Offres d'emploi par entreprise
  • Offres d'emploi par mots clés
L'entreprise
  • Qui sommes-nous ?
  • On recrute
  • Accès client
Les apps
Application Android (nouvelle fenêtre) Application ios (nouvelle fenêtre)
Nous suivre sur :
Informations légales CGU Politique de confidentialité Gérer les traceurs Accessibilité : non conforme Aide et contact