SubmissionNumber#=%=#150
FinalPaperTitle#=%=#SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials
ShortPaperTitle#=%=#
NumberOfPages#=%=#11
CopyrightSigned#=%=#aguiar
JobTitle#==#
Organization#==#
Abstract#==#This paper describes our submission to Task 2 of SemEval-2024: Safe Biomedical Natural Language Inference for Clinical Trials. The Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) consists of a Textual Entailment (TE) task focused on the evaluation of the consistency and faithfulness of Natural Language Inference (NLI) models applied to Clinical Trial Reports (CTR). We test 2 distinct approaches, one based on finetuning and ensembling Masked Language Models and the other based on prompting Large Language Models using templates, in particular, using Chain-Of-Thought and Contrastive Chain-Of-Thought. Prompting Flan-T5-large in a 2-shot setting leads to our best system that achieves 0.57 F1 score, 0.64 Faithfulness, and 0.56 Consistency.
Author{1}{Firstname}#=%=#Mathilde
Author{1}{Lastname}#=%=#Aguiar
Author{1}{Username}#=%=#mathilde.aguiar
Author{1}{Email}#=%=#mathilde.aguiar@universite-paris-saclay.fr
Author{1}{Affiliation}#=%=#Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400, Orsay, France
Author{2}{Firstname}#=%=#Pierre
Author{2}{Lastname}#=%=#Zweigenbaum
Author{2}{Username}#=%=#pierre
Author{2}{Email}#=%=#pz@lisn.fr
Author{2}{Affiliation}#=%=#LISN, CNRS, Université Paris-Saclay
Author{3}{Firstname}#=%=#Nona
Author{3}{Lastname}#=%=#Naderi
Author{3}{Username}#=%=#nona.naderi
Author{3}{Email}#=%=#nona.naderi@universite-paris-saclay.fr
Author{3}{Affiliation}#=%=#Université Paris-Saclay

==========
èéáğö