SubmissionNumber#=%=#150 FinalPaperTitle#=%=#SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials ShortPaperTitle#=%=# NumberOfPages#=%=#11 CopyrightSigned#=%=#aguiar JobTitle#==# Organization#==# Abstract#==#This paper describes our submission to Task 2 of SemEval-2024: Safe Biomedical Natural Language Inference for Clinical Trials. The Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) consists of a Textual Entailment (TE) task focused on the evaluation of the consistency and faithfulness of Natural Language Inference (NLI) models applied to Clinical Trial Reports (CTR). We test 2 distinct approaches, one based on finetuning and ensembling Masked Language Models and the other based on prompting Large Language Models using templates, in particular, using Chain-Of-Thought and Contrastive Chain-Of-Thought. Prompting Flan-T5-large in a 2-shot setting leads to our best system that achieves 0.57 F1 score, 0.64 Faithfulness, and 0.56 Consistency. Author{1}{Firstname}#=%=#Mathilde Author{1}{Lastname}#=%=#Aguiar Author{1}{Username}#=%=#mathilde.aguiar Author{1}{Email} Author{1}{Affiliation}#=%=#Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400, Orsay, France Author{2}{Firstname}#=%=#Pierre Author{2}{Lastname}#=%=#Zweigenbaum Author{2}{Username}#=%=#pierre Author{2}{Email} Author{2}{Affiliation}#=%=#LISN, CNRS, Université Paris-Saclay Author{3}{Firstname}#=%=#Nona Author{3}{Lastname}#=%=#Naderi Author{3}{Username}#=%=#nona.naderi Author{3}{Email} Author{3}{Affiliation}#=%=#Université Paris-Saclay ========== èéáğö