2023
pdf
bib
abs
Towards a Robust Detection of Language Model-Generated Text: Is ChatGPT that easy to detect?
Wissam Antoun
|
Virginie Mouilleron
|
Benoît Sagot
|
Djamé Seddah
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs
Recent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed method involves translating an English dataset into French and training a classifier on the translated data. Results show that the detectors can effectively detect ChatGPT-generated text, with a degree of robustness against basic attack techniques in in-domain settings. However, vulnerabilities are evident in out-of-domain contexts, highlighting the challenge of detecting adversarial text. The study emphasizes caution when applying in-domain testing results to a wider variety of content. We provide our translated datasets and models as open-source resources.
2020
pdf
bib
abs
The Financial Document Structure Extraction Shared task (FinToc 2020)
Najah-Imane Bentabet
|
Rémi Juge
|
Ismail El Maarouf
|
Virginie Mouilleron
|
Dialekti Valsamou-Stanislawski
|
Mahmoud El-Haj
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, its participants results and their findings. This shared task was organized as part of The 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation (FNP-FNS 2020), held at The 28th International Conference on Computational Linguistics (COLING’2020). This shared task aimed to stimulate research in systems for extracting table-of-contents (TOC) from investment documents (such as financial prospectuses) by detecting the document titles and organizing them hierarchically into a TOC. For the second edition of this shared task, two subtasks were presented to the participants: one with English documents and the other one with French documents.
pdf
The FinSim 2020 Shared Task: Learning Semantic Representations for the Financial Domain
Ismail El Maarouf
|
Youness Mansar
|
Virginie Mouilleron
|
Dialekti Valsamou-Stanislawski
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing
2016
pdf
abs
Radarly : écouter et analyser le web conversationnel en temps réel (Real time listening and analysis of the social web using Radarly)
Jade Copet
|
Christine de Carvalho
|
Virginie Mouilleron
|
Benoit Tabutiaux
|
Hugo Zanghi
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations
De par le contexte conversationnel digital, l’outil Radarly a été conçu pour permettre de traiter de grands volumes de données hétérogènes en temps réel, de générer de nouveaux indicateurs et de les visualiser sur une interface cohérente et confortable afin d’en tirer des analyses et études pertinentes. Ce document expose les techniques et processus utilisés pour extraire et traiter toutes ces données.
2013
pdf
Dynamic extension of a French morphological lexicon based a text stream (Extension dynamique de lexiques morphologiques pour le français à partir d’un flux textuel) [in French]
Benoît Sagot
|
Damien Nouvel
|
Virginie Mouilleron
|
Marion Baranes
Proceedings of TALN 2013 (Volume 1: Long Papers)
2012
pdf
The French Social Media Bank: a Treebank of Noisy User Generated Content
Djamé Seddah
|
Benoit Sagot
|
Marie Candito
|
Virginie Mouilleron
|
Vanessa Combet
Proceedings of COLING 2012