Phd Position F - M Cost And Performance-Efficient Caching For Massively Distributed Systems H/F
INRIA
Rennes - 35
CDD
Estimation → 2 592 - 4 392 € / mois
36 mois
Bac +5
Service public des collectivités territoriales
Détail du poste
PhD Position F/M Cost and Performance-Efficient Caching for Massively Distributed Systems Le descriptif de l'offre ci-dessous est en Anglais Type de contrat : CDD
Niveau de diplôme exigé : Bac +5 ou équivalent
Fonction : Doctorant
A propos du centre ou de la direction fonctionnelle
The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.
Contexte et atouts du poste
Financial and working environment.
This PhD will be in the context of IPCEI-CIS (Important Project of Common European Interest - Next Generation Cloud Infrastructure and Services) DXP (Data Exchange Platform) project involving Amadeus and three Inria research teams (COAST, CEDAR and MAGELLAN). This project aims to design and develop an open-source management solution for a federated and distributed data exchange platform (DXP), operating in an open, scalable, and massively distributed environment (cloud-edge continuum).
The PhD student will be recruited and hosted at the Inria Center at Rennes University; and the work will be carried out within the MAGELLAN team in collaboration with other partners.
The PhD student will be supervised by:
- Shadi Ibrahim, MAGELLAN team in Rennes - Cedric Tedeschi, MAGELLAN team in Rennes
Mission confiée
Context
The ever-growing number of services and Internet of Things (IoT) devices has resulted in data being distributed across different locations (regions and countries). Additionally, data exhibits different usage patterns, including cold data (written once and never read), stream data (produced once and consumed by many), and hot data (written once and consumed by many). Furthermore, these data types have different performance and dependability requirements (e.g., low latency for data streams).
Data caching is a widely used technique that improves application performance by storing data on high-speed devices close to end users. Most research on data caching has focused on the benefits of different data placement strategies (i.e., which data to place in the cache), data movement, cache partitioning, cache eviction [1, 2, 3, 4, 5, 6, 7, 8], and on realizing cost-efficient data redundancy techniques in caching systems [9]. However, few efforts have studied data management when caches are distributed across different platforms (Edge-to-Cloud), utilize heterogeneous storage devices (in terms of performance and cost), and serve multiple, diverse applications, including traditional data services, serverless workflows and data streaming.
References:
[1] Asit Dan and Don Towsley. 1990. An Approximate Analysis of the LRU and FIFO Buffer Replacement Schemes. SIGMETRICS Perform. Eval. Rev. 18, 1 (apr 1990), 143-152. https://doi.org/10.1145/98460.98525
[2] Marek Chrobak and John Noga. 1999. LRU is better than FIFO. Algorithmica 23 (02 1999), 180-185. https://doi.org/10.1007/PL00009255
[3] Blankstein, Aaron, Siddhartha Sen, and Michael J. Freedman. Hyperbolic caching: Flexible caching for web applications. 2017 USENIX Annual Technical Conference (USENIX ATC 17). 2017.
[4] Cristian Ungureanu, Biplob Debnath, Stephen Rago, and Akshat Aranya. 2013. TBF: A memory-efficient replacement policy for flash- based caches. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 1117-1128. https://doi.org/10.1109/ICDE.2013.6544902
[5] Orcun Yildiz, Amelie Chi Zhou, Shadi Ibrahim. 2018. Improving the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems with Eley. Future Generation Computer Systems, Volume 86, 2018, Pages 308-318, ISSN 0167-739X,.
[6] G. Aupy, O. Beaumont and L. Eyraud-Dubois, "Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention," 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 2019,
[7] ZHANG, Yazhuo, YANG, Juncheng, YUE, Yao, et al. {SIEVE} is simpler than {LRU}: an efficient {Turn-Key} eviction algorithm for web caches. In : 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 2024. p. 1229-1246.
[8] Juncheng Yang, Ziming Mao, Yao Yue, and K. V. Rashmi. GL-Cache: Group-level learning for efficient and high-performance caching. FAST'23, pages 115-134, 2023.
[9] RASHMI, K. V., CHOWDHURY, Mosharaf, KOSAIAN, Jack, et al.{EC-Cache}:{Load-Balanced},{Low-Latency} cluster caching with online erasure coding. In : 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 2016. p. 401-417.
Principales activités
The goal is to design cost- and performance-efficient distributed smart caching middleware that facilitates data exchange within and across data providers (and producers) and consumers (users) while considering the temperature of the data, its frequency, and the heterogeneity and dynamics of the infrastructure. Specifically, we aim to address these research questions:
- How can we seamlessly aggregate the caches of many distributed, heterogeneous machines? - Where to place data across different sites/organizations (across the IoT to cloud continuum). - Which data should be cached and for how long, how to resize the caches between different applications/users, when to empty caches, etc - How to exploit data caches for data streams and how to efficiently share caches with hot data. - In addition, it is important to study the right number of replicas to meet the users demands.
Research Methodology: This research addresses the challenges of data management in distributed, heterogeneous caches by designing and implementing novel middleware, models, algorithms, and a framework to answer the above questions. All solutions will be validated through simulations or on a real distributed infrastructure, such as Grid'5000 and Amazon Web Services.
Compétences
- An excellent Master degree in computer science or equivalent - Strong knowledge of distributed systems - Knowledge of storage and distributed file systems - Ability to conduct experimental systems research - Strong programming skills (C/C++, Python) - Working experience in the areas of Big Data management, Cloud Computing, and Data Analytics are advantageous - Very good communication skills in oral and written English
Avantages
- Subsidized meals - Partial reimbursement of public transport costs - Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.) - Possibility of teleworking (after 6 months of employment) and flexible organization of working hours - Professional equipment available (videoconferencing, loan of computer equipment, etc.) - Social, cultural and sports events and activities - Access to vocational training - Social security coverage
Rémunération
monthly gross salary 2200 euros
Voir plus
Bienvenue chez INRIA
A propos d'Inria
Inria est l'institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l'interface d'autres disciplines. L'institut fait appel à de nombreux talents dans plus d'une quarantaine de métiers différents. 900 personnels d'appui à la recherche et à l'innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.
Hellowork a estimé le salaire pour cette offre
Cette estimation de salaire pour le poste de Phd Position F - M Cost And Performance-Efficient Caching For Massively Distributed Systems H/F à Rennes est calculée grâce à des
offres similaires et aux données de l’INSEE.
Cette fourchette est variable selon expérience.
Salaire brut min
31 100 € / an2 592 € / mois17,09 € / heure
Salaire brut estimé
43 800 € / an3 650 € / mois24,07 € / heure
Salaire brut max
52 700 € / an4 392 € / mois28,96 € / heure
Cette information vous semble-t-elle utile ?
Oui
Non
Merci pour votre retour !
C'est noté
Publiée le 17/09/2025 - Réf : 35a829816138f0556ae82014d9cd19fe
Phd Position F - M Cost And Performance-Efficient Caching For Massively Distributed Systems H/F
INRIA
Rennes - 35
CDD
Publiée le 17/09/2025 - Réf : 35a829816138f0556ae82014d9cd19fe
Créez une alerte
Pour être informé rapidement des nouvelles offres, merci de préciser les critères :