Anton Korznikov
2025
SmurfCat at SHROOM-CAP: Factual but Awkward? Fluent but Wrong? Tackling Both in LLM Scientific QA
Timur Ionov
|
Evgenii Nikolaev
|
Artem Vazhentsev
|
Mikhail Chaichuk
|
Anton Korznikov
|
Elena Tutubalina
|
Alexander Panchenko
|
Vasily Konovalov
|
Elisei Rykov
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)
Large Language Models (LLMs) often generate hallucinations, a critical issue in domains like scientific communication where factual accuracy and fluency are essential. The SHROOM-CAP shared task addresses this challenge by evaluating Factual Mistakes and Fluency Mistakes across diverse languages, extending earlier SHROOM editions to the scientific domain. We present Smurfcat, our system for SHROOM-CAP, which integrates three complementary approaches: uncertainty estimation (white-box and black-box signals), encoder-based classifiers (Multilingual Modern BERT), and decoder-based judges (instruction-tuned LLMs with classification heads). Results show that decoder-based judges achieve the strongest overall performance, while uncertainty methods and encoders provide complementary strengths. Our findings highlight the value of combining uncertainty signals with encoder and decoder architectures for robust, multilingual detection of hallucinations and related errors in scientific publications.
Search
Fix author
Co-authors
- Mikhail Chaichuk 1
- Timur Ionov 1
- Vasily Konovalov 1
- Evgenii Nikolaev 1
- Alexander Panchenko 1
- show all...