Sophie Arnoult
2026
LotusOrchid at #SMM4H–HeaRD 2026: Fitting pretrained encoders for Dutch medical data
Sophie Arnoult | Shutao Chen | Piek Vossen
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Sophie Arnoult | Shutao Chen | Piek Vossen
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
This paper presents our submission to MultiClinAI’s NER subtask for #SMM4H-HeaRD 2026. We focus on the questions 1) which Language Model represents the clinical notes best and 2) which annotations can help training these models. To get answers for these questions, we follow a token-based classification approach with pretrained encoder language models, where we compare models that were pretrained on generic data against medical data, and on a single language, Dutch, against many languages. In addition, we present two data-augmented systems: one with data from the other languages of the workshop for multilingual training, and one with synthetic annotations.
2016
Factoring Adjunction in Hierarchical Phrase-Based SMT
Sophie Arnoult | Khalil Sima’an
Proceedings of the 2nd Deep Machine Translation Workshop
Sophie Arnoult | Khalil Sima’an
Proceedings of the 2nd Deep Machine Translation Workshop
2015
Modelling the Adjunct/Argument Distinction in Hierarchical Phrase-Based SMT
Sophie Arnoult | Khalil Sima’an
Proceedings of the 1st Deep Machine Translation Workshop
Sophie Arnoult | Khalil Sima’an
Proceedings of the 1st Deep Machine Translation Workshop
2014
How Synchronous are Adjuncts in Translation Data?
Sophie Arnoult | Khalil Sima’an
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
Sophie Arnoult | Khalil Sima’an
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation