Détail du poste
PhD Position F/M Defending deployed AI models: manipulation as a countermeasure
Le descriptif de l'offre ci-dessous est en Anglais
Type de contrat : CDD
Niveau de diplôme exigé : Bac +5 ou équivalent
Fonction : Doctorant
A propos du centre ou de la direction fonctionnelle
The Inria center at the University of Rennes is one of eight Inria centers and has more than thirty research teams. The Inria center is a major and recognized player in the field of digital sciences. It is at the heart of a rich ecosystem of R&D and innovation, including highly innovative SMEs, large industrial groups, competitiveness clusters, research and higher education institutions, centers of excellence, and technological research institutes.
Contexte et atouts du poste
Deployed AI models on platforms are interesting to at least two different kinds of crowds:
users and attackers. In the first case, it becomes clearer and clearer that the impact of these
models on users' everyday life must be audited for preventing abuse or bias [LMPT24]. In the
second case, the cost of training these models calls for proper defenses against malicious entities
and oensive competitors [MGW+25]. The ambition of the Cluster SequoIA's FANG chair is
to bridge the gap between these two critical setups: legal auditing and oensive security, in
the domain of modern deployed AI models. From this unique standpoint, and from the body
of work we have contributed to build in the field of AI auditing (e.g., [BGDV+25, GLMT+24,
GLMP+25, Ric26]), we expect to find new insights for attacking and defending deployed AI
models, by finding novel angles.
A key observation from this body of work is that platforms hosting AI models are not passive
actors. We have shown that platforms are incentivized to maintain the utility of their model
despite regulation, and may actively manipulate audit outcomes to their advantage [GLMT+24].
Indeed, audit manipulationwhere a platform returns strategically altered responses to an audi-
tor's queriescan severely disrupt the reliability of black-box audits [LMT20]. This manipulative
capability, currently studied as a threat to auditors, constitutes, when viewed from the security
standpoint, a powerful and largely unexplored defensive tool for model owners facing attackers.
This Ph.D. thesis proposes to bring the concepts and techniques of audit manipulation [GLMT+24,
Fuk20, Yan22] to the field of AI security, in order to design novel defenses for deployed AI models.
The central insight is the following: when a platform detects an ongoing attack (e.g., model
extraction, adversarial example crafting, or ngerprinting-based reconnaissance [Ric26]), rather
than simply blocking the attacker (which signals detection and incentivizes the attacker to adapt),
a more effective strategy is to manipulate the responses returned to the attacker. By returning
strategically biased results, the platform can degrade the quality of the attacker's extracted in-
formation, poison surrogate models being built by the attacker, or feed misleading signals that
waste the attacker's resources. This is conceptually analogous to honeypots and deception-based
defenses in classical cybersecurity, but instantiated in the specic context of machine learning
model APIs.
A critical challenge arises when the platform cannot reliably distinguish attackers from legit-
imate users or regulators. In this regime of uncertain detection, the platform must navigate a
fundamental tension: manipulated responses, if served to legitimate users, degrade the model's
utility [Kur25]. Randomized defenses [MFL22] oer a principled framework for this setting: by
injecting controlled noise or perturbations into a fraction of responses, the platform can prob-
abilistically disrupt attacks while bounding the impact on legitimate users.
We will study how to calibrate such randomized manipulation strategies, drawing on the trade-os between attack
disruption rate and model utility loss.
This thesis will leverage the formal understanding of what information dierent attacks ex-
tract, and at what query cost, to design defenses that are targeted : manipulating precisely the
dimensions of the model's output that are most valuable to attackers, while preserving the di-
mensions that matter for legitimate use and regulatory audits. This cat and mouse (or platform
and regulator) defense/audit game might improve our understanding of the limits of what is
achievable by both parties in this black-box scenario.
Mission confiée
Research questions
Can the concepts of audit manipulationwhere platforms return strategically altered re-
sponses to auditorsbe transposed to defend models against attackers? What are the formal
conditions under which manipulation-based defenses provably degrade an attacker's informa-
tion gain?
When a platform cannot reliably distinguish an attacker from a legitimate user, what is the
optimal trade-off between the amplitude of response manipulation and the resulting loss of
model utility for legitimate users?
Can randomized defenses be designed so that they selectively disrupt attack-relevant dimen-
sions of model outputs (e.g., decision boundaries exploited by adversarial attacks for classifiers,
or output distributions leveraged for LLMs extraction) while preserving the dimensions relevant for standard use?
How does the effectiveness of manipulation-based defenses depend on the type of attack
being countered? In particular, are extraction attacks, adversarial example crafting, and
fingerprinting-based reconnaissance equally vulnerable to response manipulation, or do some
attack classes require different defensive strategies fundamentally?
On the regulatory side, can manipulation-based defenses coexist with legitimate auditing by
regulators? That is, can a platform deploy active defenses against attackers without simultane-
ously disrupting the stealthy audits that regulators rely on to assess fairness and compliance?
Principales activités
Envisioned planning
t0 + 6 months: Production of a state-of-the-art on manipulation-based defenses for LLMs,
covering audit manipulation, adversarial perturbation defenses (e.g., randomized smooth-
ing [MFL22] for classiers), and detection-then-response paradigms for LLMs/agents. Formal
problem statement and threat model denition.
t0 +12 months: Design and theoretical analysis of manipulation-based defense strategies against
model extraction attacks, or other more subtle attacks.
t0 +20 months: Extension to multi-attack defense: studying how a single manipulation strategy
can simultaneously counter extraction, adversarial, and reconnaissance attacks. Analysis of
the utilitydefense trade-off under uncertain attacker detection.
t0 + 30 months: Study of the coexistence of active defenses and legitimate regulatory audits.
Formal characterization of when and how manipulation-based defenses can discriminate be-
tween attackers and auditors.
t0 + 36 months: Thesis manuscript completed, and planned defense.
Compétences
Solid theoretical background in maths and/or machine-learning
Python coding skills for experimental evaluations
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Rémunération
monthly gross salary 2300 euros
Bienvenue chez INRIA
A propos d'Inria
Inria est l'institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l'interface d'autres disciplines. L'institut fait appel à de nombreux talents dans plus d'une quarantaine de métiers différents. 900 personnels d'appui à la recherche et à l'innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.
Publiée le 02/06/2026 - Réf : bc30b07932704dab9a425070404ed0a9