Complétez votre profil pour recevoir des offres adaptées.

Mon espace

Mes CV vus

Mes candidatures

Mes alertes

Mon profil

Déconnexion

Téléchargez l'app et postulez dans les premiers !

Accès recruteur
- Diffuser ma première offre
- Déjà client
Télécharger l’app

Emploi
- Offres d’emploi
- Missions d'intérim
- Offres de stage
- Offres en alternance
- Entreprises qui recrutent
- Créer mon alerte
- Déposer mon CV
- Salaire brut net
Formation
Emploi
Emploi
Mon compte
- Se connecter Mon compte
- S'inscrire
- Mon espace
- Mes CV vus
- Mes candidatures
- Mes alertes
- Mon profil
- Déconnexion

Phd Position F - M Data Selection Techniques For Llms Reasoning Improvement H/F INRIA

Villeneuve-d'Ascq - 59

CDD

Résumé de l'offre

2 100 € / mois
36 mois
Bac +5
Service public des collectivités territoriales

Les missions du poste

PhD Position F/M Data selection techniques for LLMs reasoning improvement
Le descriptif de l'offre ci-dessous est en Anglais
Type de contrat : CDD

Niveau de diplôme exigé : Bac +5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for its strong involvement in the socio-economic development of the Hauts-De-France region, theInria University of Lille centre pursues a close relationship with large companies and SMEs. By promoting synergies between researchers and industrialists, Inria participates in the transfer of skills and expertise in digital technologies and provides access to the best European and international research for the benefit of innovation and companies, particularly in the region.

For more than 10 years, theInria University of Lille centre has been located at the heart of Lille's university and scientific ecosystem, as well as at the heart of Frenchtech, with a technology showroom based on Avenue de Bretagne in Lille, on the EuraTechnologies site of economic excellence dedicated to information and communication technologies (ICT).

Contexte et atouts du poste

Large Language Models (LLMs) have demonstrated remarkable capabilities, with reasoning models highlighting the critical role of high-quality training data. While procedural generation offers infinite training datasets in domains like logical reasoning, games, and retrieval, not all synthetic data contributes equally. Generated examples often suffer from redundancy, inappropriate difficulty, or lack meaningful signal-for instance, large number arithmetic may appear challenging but provides minimal educational value.

This PhD research addressesoptimal data selection from infinite procedural sources, moving beyond ad-hoc metrics like diversity and difficulty. The work will develop principled methodologies for assessing training dataimpact profilesusing influence techniques (influence functions, Shapley values) to quantify how individual examples contribute to model capabilities, with connections to curriculum learning principles.

The candidate will create frameworks encompassing multiple quality aspects to identify high-impact training examples, validated throughdownstream performance on real-world tasksandcomputational efficiency metrics. This research aims to establish new standards for data-efficient training and synthetic data curation.

Keywords :Large Language Models, Data Selection, Procedural Generation, Influence Functions, Training Efficiency

Mission confiée

This PhD student will collaborate with Damien Sileo and the Adada consortium (engineers and interns) to develop intelligent data selection methods for procedurally generated datasets. The research focuses on extracting high-value training examples from massive synthetic data pools, moving beyond simple similarity metrics to downstream tasks toward principled selection criteria that optimize model performance and learning efficiency.

Principales activités

Data Generation & Filtering :
- Contribute marginally to synthetic problem generators to understand generation mechanisms
- Develop large-scale data filtering pipelines for procedurally generated datasets
- Explore data representation techniques for effective sample characterization

Core Research Focus :
- Extract optimal coresets from massive synthetic datasets tailored to specific downstream tasks
- Design adaptive curriculum strategies accounting for model scale (larger models requiring more challenging examples)
- Develop hyperparameter modulation techniques for controlled generation diversity and difficulty calibration
- Move beyond similarity-based metrics to develop principled selection criteria optimizing learning outcomes

Validation & Dissemination :
- Evaluate coreset extraction and curriculum strategies across diverse reasoning tasks
- Assess scalability and computational efficiency of proposed filtering methods
- Conduct controlled experiments measuring downstream performance improvements
- Write and disseminate research findings through publications and presentations

Compétences

Languages :English (french not mandatory)

Programming language : Python

Deep learning and statistics background

Knowledge of logic and symbolic AI isa plus

Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave : 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage

Rémunération

2100 € (gross monthly salary)

Bienvenue chez INRIA

A propos d'Inria

Inria est l'institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l'interface d'autres disciplines. L'institut fait appel à de nombreux talents dans plus d'une quarantaine de métiers différents. 900 personnels d'appui à la recherche et à l'innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.

Phd Position F - M Data Selection Techniques For Llms Reasoning Improvement H/F

INRIA

Villeneuve-d'Ascq - 59
CDD

Publiée le 04/06/2025 - Réf : c08c8ffbd7ba61961723c41de6472823

Créez une alerte

Pour être informé rapidement des nouvelles offres, merci de préciser les critères :

Métier

Localité

Type de contrat

CDI

CDD

Intérim

Stage

Alternance

Indépendant

Franchise

Associé

Fonctionnaire

Freelance

Stage de lycée

En cliquant sur "Créer mon alerte", vous acceptez les CGU et déclarez avoir pris connaissance de la politique de protection des données du site hellowork.com.

Finalisez votre candidature
sur le site du recruteur

Créez votre compte pour postuler
sur le site du recruteur !

Ces offres pourraient aussi
vous intéresser

Data Officer H/F

CGI Finance

Marcq-en-Barœul - 59

CDI

Télétravail occasionnel

Voir l’offre

il y a 22 jours

Data Scientist International H/F

Oney

Croix - 59

CDI

Voir l’offre

il y a 19 jours

Data Officer - Data Factory H/F

EURO-INFORMATION DEVELOPPEMENTS

Villeneuve-d'Ascq - 59

CDI

40 000 - 60 000 € / an

Télétravail partiel

Voir l’offre

il y a 6 jours

Voir plus d'offres

Recherches similaires

Job Data scientist
Job Informatique
Job Dunkerque
Job Lille
Job Valenciennes
Job Cambrai
Job Douai
Job Maubeuge
Job Hazebrouck
Job Tourcoing
Job Saint-Amand-les-Eaux
Job Caudry
Job Développeur
Job Technicien informatique
Job Technicien support informatique
Job DevOps
Job Chef de projet informatique
Entreprises Informatique
Entreprises Data scientist
Entreprises Villeneuve-d'Ascq
Job Data
Job Fonction publique
Job Collectivités
Job Fonction publique territoriale
Job Public
Job Accompagne Villeneuve-d'Ascq
Job Cdd Villeneuve-d'Ascq
Job Fonction publique Villeneuve-d'Ascq
Job Anglais Villeneuve-d'Ascq
Job Collectivités Villeneuve-d'Ascq
INRIA Villeneuve-d'Ascq
INRIA Data scientist

Voir plus Voir moins

Accueil
Job
Job Villeneuve-d'Ascq
Job Informatique Villeneuve-d'Ascq
Job Data scientist Villeneuve-d'Ascq
Phd Position F - M Data Selection Techniques For Llms Reasoning Improvement H/F

Les sites

HelloCV
Helloworkplace
BDM
Jobijoba
Maformation
Diplomeo

L'emploi

Offres d'emploi par métier
Offres d'emploi par ville
Offres d'emploi par entreprise
Offres d'emploi par mots clés

L'entreprise

Qui sommes-nous ?
On recrute
Accès client

Les apps

Informations légales CGU Politique de confidentialité Gérer les traceurs Aide et contact

Nous suivre sur :

Phd Position F - M Data Selection Techniques For Llms Reasoning Improvement H/F INRIA

Les missions du poste

Bienvenue chez INRIA

Finalisez votre candidature sur le site du recruteur Créez votre compte pour postuler sur le site du recruteur !

Ces offres pourraient aussi vous intéresser

Recherches similaires

Finalisez votre candidature
sur le site du recruteur

Créez votre compte pour postuler
sur le site du recruteur !

Ces offres pourraient aussi
vous intéresser