Pingjun Hong


2024

pdf
LMU-BioNLP at SemEval-2024 Task 2: Large Diverse Ensembles for Robust Clinical NLI
Zihang Sun | Danqi Yan | Anyi Wang | Tanalp Agustoslu | Qi Feng | Chengzhi Hu | Longfei Zuo | Shijia Zhou | Hermine Kleiner | Pingjun Hong
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we describe our submission for the NLI4CT 2024 shared task on robust Natural Language Inference over clinical trial reports. Our system is an ensemble of nine diverse models which we aggregate via majority voting. The models use a large spectrum of different approaches ranging from a straightforward Convolutional Neural Network over fine-tuned Large Language Models to few-shot-prompted language models using chain-of-thought reasoning.Surprisingly, we find that some individual ensemble members are not only more accurate than the final ensemble model but also more robust.