Mohammad Ghiasvand Mohammadkhani
Also published as: Mohammad Ghiasvand Mohammadkhani
2025
Checklist Engineering Empowers Multilingual LLM Judges
Mohammad Ghiasvand Mohammadkhani
|
Hamid Beigy
Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models
Automated text evaluation has long been a central issue in Natural Language Processing (NLP). Recently, the field has shifted toward using Large Language Models (LLMs) as evaluators—a trend known as the LLM-as-a-Judge paradigm. While promising and easily adaptable across tasks, this approach has seen limited exploration in multilingual contexts. Existing multilingual studies often rely on proprietary models or require extensive training data for fine-tuning, raising concerns about cost, time, and efficiency. In this paper, we propose Checklist Engineering based LLM-as-a-Judge (CE-Judge), a training-free framework that uses checklist intuition for multilingual evaluation with an open-source model. Experiments across multiple languages and three benchmark datasets, under both pointwise and pairwise settings, show that our method generally surpasses the baselines and performs on par with the GPT-4o model.
2024
Zero-Shot Learning and Key Points Are All You Need for Automated Fact-Checking
Mohammad Ghiasvand Mohammadkhani
|
Ali Ghiasvand Mohammadkhani
|
Hamid Beigy
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)
Automated fact-checking is an important task because determining the accurate status of a proposed claim within the vast amount of information available online is a critical challenge. This challenge requires robust evaluation to prevent the spread of false information. Modern large language models (LLMs) have demonstrated high capability in performing a diverse range of Natural Language Processing (NLP) tasks. By utilizing proper prompting strategies, their versatility—due to their understanding of large context sizes and zero-shot learning ability—enables them to simulate human problem-solving intuition and move towards being an alternative to humans for solving problems. In this work, we introduce a straightforward framework based on _**Z**ero-**S**hot **L**earning_ and _**Ke**y **P**oints_ (ZSL-KeP) for automated fact-checking, which despite its simplicity, performed well on the AVeriTeC shared task dataset by robustly improving the baseline and achieving 10th place.