Nikolay Ivanov
2026
Fine-Grained Semantic Comparison of Legal Documents using LLMs
Elisei Rykov | Nikolay Ivanov | Maria Bandulevich | Kseniia Petrushina | Valentin Malykh | Vasily Konovalov | Alexander Panchenko | Ilseyar Alimova
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Elisei Rykov | Nikolay Ivanov | Maria Bandulevich | Kseniia Petrushina | Valentin Malykh | Vasily Konovalov | Alexander Panchenko | Ilseyar Alimova
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Frequent revisions of complex regulatory documents in large organizations often introduce inconsistencies and contradictions that are difficult for lawyers and auditors to detect manually. Existing tools rely on character-level diffs and therefore miss paraphrases and semantic shifts. We introduce LegDiff, a novel benchmark for evaluating span-aware semantic comparison of legal texts, and use it to investigate the ability of large language models to detect semantic changes beyond token- and character-level matching. LegDiff comprises manually annotated pairs of legal paragraphs drawn from different documents. In addition, we present a pipeline to generate synthetic training data that aligns with the manual annotations and mirrors the structure and label distribution of the manually curated benchmark, and a visualization tool for clearly displaying detected differences and inconsistencies. The dataset, code, and a visualization tool are publicly available to facilitate reproducibility and further research (https://github.com/s-nlp/SLeDoC).
2025
LLM-Independent Adaptive RAG: Let the Question Speak for Itself
Maria Marina | Nikolay Ivanov | Sergey Pletenev | Mikhail Salnikov | Daria Galimzianova | Nikita Krayko | Vasily Konovalov | Alexander Panchenko | Viktor Moskvoretskii
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Maria Marina | Nikolay Ivanov | Sergey Pletenev | Mikhail Salnikov | Daria Galimzianova | Nikita Krayko | Vasily Konovalov | Alexander Panchenko | Viktor Moskvoretskii
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) are prone to hallucinations, and Retrieval-Augmented Generation (RAG) helps mitigate this, but at a high computational cost while risking misinformation. Adaptive retrieval aims to retrieve only when necessary, but existing approaches rely on LLM-based uncertainty estimation, which remains inefficient and impractical.In this study, we introduce lightweight LLM-independent adaptive retrieval methods based on external information. We investigated 27 features, organized into 7 groups, and their hybrid combinations. We evaluated these methods on 6 QA datasets, assessing the QA performance and efficiency. The results show that our approach matches the performance of complex LLM-based methods while achieving significant efficiency gains, demonstrating the potential of external information for adaptive retrieval.
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Sergey Pletenev | Maria Marina | Nikolay Ivanov | Daria Galimzianova | Nikita Krayko | Mikhail Salnikov | Vasily Konovalov | Alexander Panchenko | Viktor Moskvoretskii
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Sergey Pletenev | Maria Marina | Nikolay Ivanov | Daria Galimzianova | Nikita Krayko | Mikhail Salnikov | Vasily Konovalov | Alexander Panchenko | Viktor Moskvoretskii
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) often hallucinate in question answering (QA) tasks. A key yet underexplored factor contributing to this is the temporality of questions – whether they are evergreen (answers remain stable over time) or mutable (answers change). In this work, we introduce EverGreenQA, the first multilingual QA dataset with evergreen labels, supporting both evaluation and training. Using EverGreenQA, we benchmark 12 modern LLMs to assess whether they encode question temporality explicitly (via verbalized judgments) or implicitly (via uncertainty signals). We also train EG-E5, a lightweight multilingual classifier that achieves SoTA performance on this task. Finally, we demonstrate the practical utility of evergreen classification across three applications: improving self-knowledge estimation, filtering QA datasets, and explaining GPT-4o’s retrieval behavior.
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
Viktor Moskvoretskii | Maria Marina | Mikhail Salnikov | Nikolay Ivanov | Sergey Pletenev | Daria Galimzianova | Nikita Krayko | Vasily Konovalov | Irina Nikishina | Alexander Panchenko
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Viktor Moskvoretskii | Maria Marina | Mikhail Salnikov | Nikolay Ivanov | Sergey Pletenev | Daria Galimzianova | Nikita Krayko | Vasily Konovalov | Irina Nikishina | Alexander Panchenko
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Retrieval Augmented Generation (RAG) improves correctness of Question Answering (QA) and addresses hallucinations in Large Language Models (LLMs), yet greatly increase computational costs. Besides, RAG is not always needed as may introduce irrelevant information. Recent adaptive retrieval methods integrate LLMs’ intrinsic knowledge with external information appealing to LLM self-knowledge, but they often neglect efficiency evaluations and comparisons with uncertainty estimation techniques. We bridge this gap by conducting a comprehensive analysis of 35 adaptive retrieval methods, including 8 recent approaches and 27 uncertainty estimation techniques, across 6 datasets using 10 metrics for QA performance, self-knowledge, and efficiency. Our findings show that uncertainty estimation techniques often outperform complex pipelines in terms of efficiency and self-knowledge, while maintaining comparable QA performance.