2024
pdf
abs
DUTh at SemEval 2024 Task 5: A multi-task learning approach for the Legal Argument Reasoning Task in Civil Procedure
Ioannis Maslaris
|
Avi Arampatzis
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Text-generative models have proven to be good reasoners. Although reasoning abilities are mostly observed in larger language models, a number of strategies try to transfer this skill to smaller language models. This paper presents our approach to SemEval 2024 Task-5: The Legal Argument Reasoning Task in Civil Procedure. This shared task aims to develop a system that efficiently handles a multiple-choice question-answering task in the context of the US civil procedure domain. The dataset provides a human-generated rationale for each answer. Given the complexity of legal issues, this task certainly challenges the reasoning abilities of LLMs and AI systems in general. Our work explores fine-tuning an LLM as a correct/incorrect answer classifier. In this context, we are making use of multi-task learning toincorporate the rationales into the fine-tuning process.
pdf
abs
DUTh at SemEval-2024 Task 6: Comparing Pre-trained Models on Sentence Similarity Evaluation for Detecting of Hallucinations and Related Observable Overgeneration Mistakes
Ioanna Iordanidou
|
Ioannis Maslaris
|
Avi Arampatzis
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
In this paper, we present our approach toSemEval-2024 Task 6: SHROOM, a Sharedtask on Hallucinations and Related ObservableOvergeneration Mistakes, which aims to determine weather AI generated text is semanticallycorrect or incorrect. This work is a comparative study of Large Language Models (LLMs)in the context of the task, shedding light ontheir effectiveness and nuances. We present asystem that leverages pre-trained LLMs, suchas LaBSE, T5, and DistilUSE, for binary classification of given sentences into ‘Hallucination’or ‘Not Hallucination’ classes by evaluatingthe model’s output against the reference correct text. Moreover, beyond utilizing labeleddatasets, our methodology integrates syntheticlabel creation in unlabeled datasets, followedby the prediction of test labels.
pdf
abs
DUTh at SemEval 2024 Task 8: Comparing classic Machine Learning Algorithms and LLM based methods for Multigenerator, Multidomain and Multilingual Machine-Generated Text Detection
Theodora Kyriakou
|
Ioannis Maslaris
|
Avi Arampatzis
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Text-generative models evolve rapidly nowadays. Although, they are very useful tools for a lot of people, they have also raised concerns for different reasons. This paper presents our work for SemEval2024 Task-8 on 2 out of the 3 subtasks. This shared task aims at finding automatic models for making AI vs. human written text classification easier. Our team, after trying different preprocessing, several Machine Learning algorithms, and some LLMs, ended up with mBERT, XLM-RoBERTa, and BERT for the tasks we submitted. We present both positive and negative methods, so that future researchers are informed about what works and what doesn’t.