2025
pdf
bib
abs
RAGulator: Effective RAG for Regulatory Question Answering
Islam Aushev
|
Egor Kratkov
|
Evgenii Nikolaev
|
Andrei Glinskii
|
Vasilii Krikunov
|
Alexander Panchenko
|
Vasily Konovalov
|
Julia Belikova
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)
Regulatory Natural Language Processing (RegNLP) is a multidisciplinary domain focused on facilitating access to and comprehension of regulatory regulations and requirements. This paper outlines our strategy for creating a system to address the Regulatory Information Retrieval and Answer Generation (RIRAG) challenge, which was conducted during the RegNLP 2025 Workshop. The objective of this competition is to design a system capable of efficiently extracting pertinent passages from regulatory texts (ObliQA) and subsequently generating accurate, cohesive responses to inquiries related to compliance and obligations. Our proposed method employs a lightweight BM25 pre-filtering in retrieving relevant passages. This technique efficiently shortlisting candidates for subsequent processing with Transformer-based embeddings, thereby optimizing the use of resources.
pdf
bib
abs
SmurfCat at SemEval-2025 Task 3: Bridging External Knowledge and Model Uncertainty for Enhanced Hallucination Detection
Elisei Rykov
|
Valerii Olisov
|
Maksim Savkin
|
Artem Vazhentsev
|
Kseniia Titova
|
Alexander Panchenko
|
Vasily Konovalov
|
Julia Belikova
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
The Multilingual shared-task on Hallucinations and Related Observable Overgeneration Mistakes in the SemEval-2025 competition aims to detect hallucination spans in the outputs of instruction-tuned LLMs in a multilingual context. In this paper, we address the detection of span hallucinations by applying an ensemble of approaches. In particular, we synthesized a PsiloQA dataset and fine-tuned LLM to detect hallucination spans. In addition, we combined this approach with a white-box method based on uncertainty quantification techniques. Using our combined pipeline, we achieved 3rd place in detecting span hallucinations in Arabic, Catalan, Finnish, Italian, and ranked within the top ten for the rest of the languages.
pdf
bib
abs
FactDebug at SemEval-2025 Task 7: Hybrid Retrieval Pipeline for Identifying Previously Fact-Checked Claims Across Multiple Languages
Evgenii Nikolaev
|
Ivan Bondarenko
|
Islam Aushev
|
Vasilii Krikunov
|
Andrei Glinskii
|
Vasily Konovalov
|
Julia Belikova
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
The proliferation of multilingual misinformation demands robust systems for crosslingual fact-checked claim retrieval. This paper addresses SemEval-2025 Shared Task 7, which challenges participants to retrieve fact-checks for social media posts across 14 languages, even when posts and fact-checks are in different languages. We propose a hybrid retrieval pipeline that combines sparse lexical matching (BM25, BGE-m3) and dense semantic retrieval (pretrained and fine-tuned BGE-m3) with dynamic fusion and curriculum-trained rerankers. Our system achieves 67.2% crosslingual and 86.01% monolingual accuracy on the Shared Task MultiClaim dataset.
2024
pdf
bib
abs
DeepPavlov at SemEval-2024 Task 3: Multimodal Large Language Models in Emotion Reasoning
Julia Belikova
|
Dmitrii Kosenko
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper presents the solution of the DeepPavlov team for the Multimodal Sentiment Cause Analysis competition in SemEval-2024 Task 3, Subtask 2 (Wang et al., 2024). In the evaluation leaderboard, our approach ranks 7th with an F1-score of 0.2132. Large Language Models (LLMs) are transformative in their ability to comprehend and generate human-like text. With recent advancements, Multimodal Large Language Models (MLLMs) have expanded LLM capabilities, integrating different modalities such as audio, vision, and language. Our work delves into the state-of-the-art MLLM Video-LLaMA, its associated modalities, and its application to the emotion reasoning downstream task, Multimodal Emotion Cause Analysis in Conversations (MECAC). We investigate the model’s performance in several modes: zero-shot, few-shot, individual embeddings, and fine-tuned, providing insights into their limits and potential enhancements for emotion understanding.
pdf
bib
abs
JellyBell at TextGraphs-17 Shared Task: Fusing Large Language Models with External Knowledge for Enhanced Question Answering
Julia Belikova
|
Evegeniy Beliakin
|
Vasily Konovalov
Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing
This work describes an approach to develop Knowledge Graph Question Answering (KGQA) system for TextGraphs-17 shared task. The task focuses on the fusion of Large Language Models (LLMs) with Knowledge Graphs (KGs). The goal is to select a KG entity (out of several candidates) which corresponds to an answer given a textual question. Our approach applies LLM to identify the correct answer among the list of possible candidates. We confirm that integrating external information is particularly beneficial when the subject entities are not well-known, and using RAG can negatively impact the performance of LLM on questions related to popular entities, as the retrieved context might be misleading. With our result, we achieved 2nd place in the post-evaluation phase.