SmurfCat at SHROOM-CAP: Factual but Awkward? Fluent but Wrong? Tackling Both in LLM Scientific QA
Timur Ionov, Evgenii Nikolaev, Artem Vazhentsev, Mikhail Chaichuk, Anton Korznikov, Elena Tutubalina, Alexander Panchenko, Vasily Konovalov, Elisei Rykov
Abstract
Large Language Models (LLMs) often generate hallucinations, a critical issue in domains like scientific communication where factual accuracy and fluency are essential. The SHROOM-CAP shared task addresses this challenge by evaluating Factual Mistakes and Fluency Mistakes across diverse languages, extending earlier SHROOM editions to the scientific domain. We present Smurfcat, our system for SHROOM-CAP, which integrates three complementary approaches: uncertainty estimation (white-box and black-box signals), encoder-based classifiers (Multilingual Modern BERT), and decoder-based judges (instruction-tuned LLMs with classification heads). Results show that decoder-based judges achieve the strongest overall performance, while uncertainty methods and encoders provide complementary strengths. Our findings highlight the value of combining uncertainty signals with encoder and decoder architectures for robust, multilingual detection of hallucinations and related errors in scientific publications.- Anthology ID:
- 2025.chomps-main.8
- Volume:
- Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Aman Sinha, Raúl Vázquez, Timothee Mickus, Rohit Agarwal, Ioana Buhnila, Patrícia Schmidtová, Federica Gamba, Dilip K. Prasad, Jörg Tiedemann
- Venues:
- CHOMPS | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 81–89
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.chomps-main.8/
- DOI:
- Cite (ACL):
- Timur Ionov, Evgenii Nikolaev, Artem Vazhentsev, Mikhail Chaichuk, Anton Korznikov, Elena Tutubalina, Alexander Panchenko, Vasily Konovalov, and Elisei Rykov. 2025. SmurfCat at SHROOM-CAP: Factual but Awkward? Fluent but Wrong? Tackling Both in LLM Scientific QA. In Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025), pages 81–89, Mumbai, India. Association for Computational Linguistics.
- Cite (Informal):
- SmurfCat at SHROOM-CAP: Factual but Awkward? Fluent but Wrong? Tackling Both in LLM Scientific QA (Ionov et al., CHOMPS 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.chomps-main.8.pdf