Home > TERATEC FORUM > Workshops > Workshop 6

Teratec 2024 Forum
Thursday, May 30th

Workshop 05 - 09:00 am to 12:30 am

Applications of AI in research and industry
Chaired by Stéphane Requena, Director Innovation & Technology, Genci and Patrick Fabiani, AI Roadmapping & Advanced Scientific Studies, Dassault Aviation

Should we be afraid of the big bad GPT?
Demystifying language models and preventing their weaponization

By Djamé Seddah, Associate Professor in CS, Inria

At a time when every public position is either the subject of instant "meme-ization" or sifted through by a fact-checking process that is as rigorous as it is often invisible, the emergence of high-performance text-generation tools of confounding ease of use raises questions, raises concerns and sometimes even worries. Most of these tools available to the general public are pure black boxes about which we know little or nothing: their training data? nothing. Their architectures? So little. Their performance? usually a matter of guesswork outside lab’s benchmarks.

Can we at least detect their content and therefore their possible influences? Not really, even OpenAI, creator of ChatGPT, reports success rates of 26%. In this presentation, I'll give an overview of the main language models, question their relevance in an academic environment and address the question of their detectability in adversarial contexts. I'll also address the growing problem of their weaponization.

Biography: Djamé Seddah is a tenured associate professor at Sorbonne University and on leave at INRIA Paris in the Almanach team. His interests cover the field of natural language processing, mainly wide-coverage multilingual syntactic analysis, the syntax-semantics interface, language models for low-resource languages, etc. A specialist in the construction of annotated corpora (Sequoia corpus, French Social Media Bank, French Question Bank, Narabizi Treebank, etc.), he participated in the development of the CamemBERT, PagnolXL and CamemBERTa language models, as well as character-based models for dialectal and highly noisy languages. His current research focuses on language models and possible ways of avoiding their weaponization (content detection, bias detection and mitigation, etc.).

Register now and get your badge here >>>

© Ter@tec - All rights reserved - Lawful mention