Abstract
Recent advancements in natural language processing (NLP) have prompted the development of sophisticated reasoning benchmarks. This paper presents our system for the SemEval 2024 Task 9 competition and also investigates the efficacy of fine-tuning language models (LMs) on BrainTeaser—a benchmark designed to evaluate NLP models’ lateral thinking and creative reasoning abilities. Our experiments focus on two prominent families of pre-trained models, BERT and T5. Additionally, we explore the potential benefits of multi-task fine-tuning on commonsense reasoning datasets to enhance performance. Our top-performing model, DeBERTa-v3-large, achieves an impressive overall accuracy of 93.33%, surpassing human performance.- Anthology ID:
- 2024.semeval-1.218
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1523–1528
- Language:
- URL:
- https://aclanthology.org/2024.semeval-1.218
- DOI:
- 10.18653/v1/2024.semeval-1.218
- Cite (ACL):
- Seyed Ali Farokh and Hossein Zeinali. 2024. ALF at SemEval-2024 Task 9: Exploring Lateral Thinking Capabilities of LMs through Multi-task Fine-tuning. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1523–1528, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- ALF at SemEval-2024 Task 9: Exploring Lateral Thinking Capabilities of LMs through Multi-task Fine-tuning (Farokh & Zeinali, SemEval 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.218.pdf