Aller au contenu principal
CEA emploi
CEA recrutement

Backdoor Attack Scalability And Defense Evaluation In Large Language Models H/F CEA

  • Gif-sur-Yvette - 91
  • Stage
  • Industrie high-tech • Telecom
Lire dans l'app

Les missions du poste

Context: Large Language Models (LLMs) deployed in safety-critical domains face significant threats from backdoor attacks. Recent empirical evidence contradicts previous assumptions about attack scalability: poisoning attacks remain effective regardless of model or dataset size, requiring as few as 250 poisoned documents to compromise models from up to 13B parameters. This suggests data poisoning becomes easier, not harder, as systems scale.
Backdoors persist through post-training alignment techniques like Supervised Fine-Tuning and Reinforcement Learning from Human Feedback, compromising current defenses. However, persistence depends critically on poisoning timing and backdoor characteristics. Current verification methods are computationally prohibitive-Proof-of-Learning requires full model retraining and complete training transcript access. While step-wise verification shows promise for runtime detection, scalability to production models and resilience against adaptive adversaries remain unresolved.
Existing defenses focus on post-training detection rather than preventing attack success during training. Advancing data poisoning scaling dynamics-understanding how attack success correlates with dataset composition, poisoning density, and model capacity-is essential for developing evidence-based threat models and defense strategies.
Objective: This internship aims to empirically test and advance data poisoning attacks and defenses for LLMs through systematic experimentation and adversarial evaluation. Key responsibilities include: implementing state-of-the-art attack methods across multiple vectors (jailbreaking, targeted refusal, denial-of-service, information extraction); testing attacks on diverse model architectures and scales; establishing standardized evaluation protocols with metrics such as Attack Success Rate and Clean Accuracy; evaluating existing defenses, particularly step-wise verification; and developing reproducible test suites for objective defense benchmarking.

Requirements:
Background in computer science or a related field, with a focus on machine learning security, or adversarial machine learning.
Strong programming skills in languages commonly used for machine learning tasks (e.g., Python, C++).
Experience with machine learning systems, model training, or adversarial robustness is a plus.
Ability to work independently and collaborate in a research-driven environment.
Comfortable working in English, essential for documentation purposes.

  • Télétravail jusqu’à 3 jours par semaine
  • 52 jours de congés/RTT
  • Possibilité d’aménagement du temps de travail
  • Formation personnalisée
  • Restauration d’entreprise
  • Offre de transport interne et prise en charge Navigo and co,
  • Mutuelle d’entreprise avantageuse
  • CE (aides vacances, loisirs, frais de garde, scolarité des enfants etc

Les étapes de recrutement

Les étapes de recrutement peuvent varier selon l'offre à laquelle vous postulez.

  • Dépôt de CV via notre site carrière

  • Préqualification téléphonique

  • Entretiens et évaluation avec manager et RH

  • Négociation salariale et contrat de travail

  • Embauche et intégration

0 / 9

La carte

19510 D36

91190 Saclay

Localiser le poste

Publiée le 30/11/2025 - Réf : 2025-37960

Backdoor Attack Scalability And Defense Evaluation In Large Language Models H/F

CEA
  • Gif-sur-Yvette - 91
  • Stage

Pour les postes éligibles :

Télétravail partiel
Publiée le 30/11/2025 - Réf : 2025-37960

Finalisez votre candidature

sur le site du recruteur

Créez votre compte pour postuler

sur le site du recruteur !

Voir plus d'offres
Initialisation…
Les sites
L'emploi
  • Offres d'emploi par métier
  • Offres d'emploi par ville
  • Offres d'emploi par entreprise
  • Offres d'emploi par mots clés
L'entreprise
  • Qui sommes-nous ?
  • On recrute
  • Accès client
Les apps
Application Android (nouvelle fenêtre) Application ios (nouvelle fenêtre)
Nous suivre sur :
Informations légales CGU Politique de confidentialité Gérer les traceurs Accessibilité : non conforme Aide et contact