Patrick Ruch
2025
TransBERT: A Framework for Synthetic Translation in Domain-Specific Language Modeling
Julien Knafou | Luc Mottin | Anaïs Mottaz | Alexandre Flament | Patrick Ruch
Findings of the Association for Computational Linguistics: EMNLP 2025
Julien Knafou | Luc Mottin | Anaïs Mottaz | Alexandre Flament | Patrick Ruch
Findings of the Association for Computational Linguistics: EMNLP 2025
The scarcity of non-English language data in specialized domains significantly limits the development of effective Natural Language Processing (NLP) tools. We present TransBERT, a novel framework for pre-training language models using exclusively synthetically translated text, and introduce TransCorpus, a scalable translation toolkit. Focusing on the life sciences domain in French, our approach demonstrates that state-of-the-art performance on various downstream tasks can be achieved solely by leveraging synthetically translated data. We release the TransCorpus toolkit, the TransCorpus-bio-fr corpus (36.4GB of French life sciences text), TransBERT-bio-fr, its associated pre-trained language model and reproducible code for both pre-training and fine-tuning. Our results highlight the viability of synthetic translation in a high-resource translation direction for building high-quality NLP resources in low-resource language/domain pairs.
2020
Contextualized French Language Models for Biomedical Named Entity Recognition
Jenny Copara | Julien Knafou | Nona Naderi | Claudia Moro | Patrick Ruch | Douglas Teodoro
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes
Jenny Copara | Julien Knafou | Nona Naderi | Claudia Moro | Patrick Ruch | Douglas Teodoro
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes
Named entity recognition (NER) is key for biomedical applications as it allows knowledge discovery in free text data. As entities are semantic phrases, their meaning is conditioned to the context to avoid ambiguity. In this work, we explore contextualized language models for NER in French biomedical text as part of the Défi Fouille de Textes challenge. Our best approach achieved an F1 -measure of 66% for symptoms and signs, and pathology categories, being top 1 for subtask 1. For anatomy, dose, exam, mode, moment, substance, treatment, and value categories, it achieved an F1 -measure of 75% (subtask 2). If considered all categories, our model achieved the best result in the challenge, with an F1 -measure of 72%. The use of an ensemble of neural language models proved to be very effective, improving a CRF baseline by up to 28% and a single specialised language model by 4%.
BiTeM at WNUT 2020 Shared Task-1: Named Entity Recognition over Wet Lab Protocols using an Ensemble of Contextual Language Models
Julien Knafou | Nona Naderi | Jenny Copara | Douglas Teodoro | Patrick Ruch
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Julien Knafou | Nona Naderi | Jenny Copara | Douglas Teodoro | Patrick Ruch
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Recent improvements in machine-reading technologies attracted much attention to automation problems and their possibilities. In this context, WNUT 2020 introduces a Name Entity Recognition (NER) task based on wet laboratory procedures. In this paper, we present a 3-step method based on deep neural language models that reported the best overall exact match F1-score (77.99%) of the competition. By fine-tuning 10 times, 10 different pretrained language models, this work shows the advantage of having more models in an ensemble based on a majority of votes strategy. On top of that, having 100 different models allowed us to analyse the combinations of ensemble that demonstrated the impact of having multiple pretrained models versus fine-tuning a pretrained model multiple times.
2006
Argumentative Feedback: A Linguistically-Motivated Term Expansion for Information Retrieval
Patrick Ruch | Imad Tbahriti | Julien Gobeill | Alan R. Aronson
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
Patrick Ruch | Imad Tbahriti | Julien Gobeill | Alan R. Aronson
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
2004
Query Translation by Text Categorization
Patrick Ruch
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics
Patrick Ruch
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics
An Argumentative Annotation Schema for Meeting Discussions
Vincenzo Pallotta | Hatem Ghorbel | Patrick Ruch | Giovanni Coray
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Vincenzo Pallotta | Hatem Ghorbel | Patrick Ruch | Giovanni Coray
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)
Nigel Collier | Patrick Ruch | Adeline Nazarenko
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)
Nigel Collier | Patrick Ruch | Adeline Nazarenko
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)
Using Argumentation to Retrieve Articles with Similar Citations from MEDLINE
Imad Tbahriti | Christine Chichester | Frédérique Lisacek | Patrick Ruch
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)
Imad Tbahriti | Christine Chichester | Frédérique Lisacek | Patrick Ruch
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)
2002
Using Contextual Spelling Correction to Improve Retrieval Effectiveness in Degraded Text Collections
Patrick Ruch
COLING 2002: The 19th International Conference on Computational Linguistics
Patrick Ruch
COLING 2002: The 19th International Conference on Computational Linguistics