Islam Aushev


2025

pdf bib
RAGulator: Effective RAG for Regulatory Question Answering
Islam Aushev | Egor Kratkov | Evgenii Nikolaev | Andrei Glinskii | Vasilii Krikunov | Alexander Panchenko | Vasily Konovalov | Julia Belikova
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)

Regulatory Natural Language Processing (RegNLP) is a multidisciplinary domain focused on facilitating access to and comprehension of regulatory regulations and requirements. This paper outlines our strategy for creating a system to address the Regulatory Information Retrieval and Answer Generation (RIRAG) challenge, which was conducted during the RegNLP 2025 Workshop. The objective of this competition is to design a system capable of efficiently extracting pertinent passages from regulatory texts (ObliQA) and subsequently generating accurate, cohesive responses to inquiries related to compliance and obligations. Our proposed method employs a lightweight BM25 pre-filtering in retrieving relevant passages. This technique efficiently shortlisting candidates for subsequent processing with Transformer-based embeddings, thereby optimizing the use of resources.

pdf bib
FactDebug at SemEval-2025 Task 7: Hybrid Retrieval Pipeline for Identifying Previously Fact-Checked Claims Across Multiple Languages
Evgenii Nikolaev | Ivan Bondarenko | Islam Aushev | Vasilii Krikunov | Andrei Glinskii | Vasily Konovalov | Julia Belikova
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

The proliferation of multilingual misinformation demands robust systems for crosslingual fact-checked claim retrieval. This paper addresses SemEval-2025 Shared Task 7, which challenges participants to retrieve fact-checks for social media posts across 14 languages, even when posts and fact-checks are in different languages. We propose a hybrid retrieval pipeline that combines sparse lexical matching (BM25, BGE-m3) and dense semantic retrieval (pretrained and fine-tuned BGE-m3) with dynamic fusion and curriculum-trained rerankers. Our system achieves 67.2% crosslingual and 86.01% monolingual accuracy on the Shared Task MultiClaim dataset.