Amal Htait


2026

The exponential growth of clinical trial reports (CTRs) presents a critical challenge for evidence-based medicine, with manual systematic reviews requiring months to synthesise findings. This paper evaluates Large Language Models (LLMs) and retrieval methods for automated Natural Language Inference (NLI) and evidence extraction from CTRs, and seeks to improve upon previously reported results in this domain. Using the NLI4CT dataset containing 2,400 annotated statement-evidence pairs from breast cancer trials, we conducted a comparative evaluation of general-purpose LLMs, domain-specific LLMs, and transformer-based baselines across entailment classification and evidence retrieval tasks. Reasoning-capable, general-purpose LLMs (such as Qwen-32B) demonstrated superior performance in the entailment classification task, exceeding both the performance of other models evaluated in this study and the previously reported state-of-the-art results. Although domain-specific adaptations showed improvements at comparable scale, larger general-purpose language models maintained superior absolute performance. For evidence retrieval, Large Language embedding models (such as bge-large-en-v1.5) surpassed classic transformer-based ranking approaches. These findings demonstrate that modern LLMs with reasoning capabilities can effectively support real-time clinical evidence synthesis without task-specific fine-tuning, offering a pathway toward scalable automated systems for clinical trial interpretation that could substantially reduce the evidence-to-practice gap in medical decision-making.

2018

We present, in this paper, our contribution in DEFT 2018 task 2 : “Global polarity”, determining the overall polarity (Positive, Negative, Neutral or MixPosNeg) of tweets regarding public transport, in French language. Our system is based on a list of sentiment seed-words adapted for French public transport tweets. These seed-words are extracted from DEFT’s training annotated dataset, and the sentiment relations between seed-words and other terms are captured by cosine measure of their word embeddings representations, using a French language word embeddings model of 683k words. Our semi-supervised system achieved an F1-measure equals to 0.64.

2017

We present, in this paper, our contribution in SemEval2017 task 4 : “Sentiment Analysis in Twitter”, subtask A: “Message Polarity Classification”, for English and Arabic languages. Our system is based on a list of sentiment seed words adapted for tweets. The sentiment relations between seed words and other terms are captured by cosine similarity between the word embedding representations (word2vec). These seed words are extracted from datasets of annotated tweets available online. Our tests, using these seed words, show significant improvement in results compared to the use of Turney and Littman’s (2003) seed words, on polarity classification of tweet messages.

2016

In this paper, we present the automatic annotation of bibliographical references’ zone in papers and articles of XML/TEI format. Our work is applied through two phases: first, we use machine learning technology to classify bibliographical and non-bibliographical paragraphs in papers, by means of a model that was initially created to differentiate between the footnotes containing or not containing bibliographical references. The previous description is one of BILBO’s features, which is an open source software for automatic annotation of bibliographic reference. Also, we suggest some methods to minimize the margin of error. Second, we propose an algorithm to find the largest list of bibliographical references in the article. The improvement applied on our model results an increase in the model’s efficiency with an Accuracy equal to 85.89. And by testing our work, we are able to achieve 72.23% as an average for the percentage of success in detecting bibliographical references’ zone.