Viktor Moskvoretskii


2025

pdf bib
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
Viktor Moskvoretskii | Maria Marina | Mikhail Salnikov | Nikolay Ivanov | Sergey Pletenev | Daria Galimzianova | Nikita Krayko | Vasily Konovalov | Irina Nikishina | Alexander Panchenko
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval Augmented Generation (RAG) improves correctness of Question Answering (QA) and addresses hallucinations in Large Language Models (LLMs), yet greatly increase computational costs. Besides, RAG is not always needed as may introduce irrelevant information. Recent adaptive retrieval methods integrate LLMs’ intrinsic knowledge with external information appealing to LLM self-knowledge, but they often neglect efficiency evaluations and comparisons with uncertainty estimation techniques. We bridge this gap by conducting a comprehensive analysis of 35 adaptive retrieval methods, including 8 recent approaches and 27 uncertainty estimation techniques, across 6 datasets using 10 metrics for QA performance, self-knowledge, and efficiency. Our findings show that uncertainty estimation techniques often outperform complex pipelines in terms of efficiency and self-knowledge, while maintaining comparable QA performance.

pdf bib
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
Maxim Zhelnin | Viktor Moskvoretskii | Egor Shvetsov | Maria Krylova | Venediktov Egor | Zuev Aleksandr | Evgeny Burnaev
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Parameter Efficient Fine-Tuning (PEFT) methods have gained popularity and democratized the usage of Large Language Models (LLMs). Recent studies have shown that a small subset of weights significantly impacts performance. Based on this observation, we introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW). Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. To identify these columns, we developed a generalized sensitivity metric that extends and unifies metrics from previous studies. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget. Moreover, GIFT-SW offers practical advantages to recover performance of models subjected to mixed-precision quantization with keeping salient weights in full precision.

pdf bib
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Sergey Pletenev | Maria Marina | Nikolay Ivanov | Daria Galimzianova | Nikita Krayko | Mikhail Salnikov | Vasily Konovalov | Alexander Panchenko | Viktor Moskvoretskii
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) often hallucinate in question answering (QA) tasks. A key yet underexplored factor contributing to this is the temporality of questions – whether they are evergreen (answers remain stable over time) or mutable (answers change). In this work, we introduce EverGreenQA, the first multilingual QA dataset with evergreen labels, supporting both evaluation and training. Using EverGreenQA, we benchmark 12 modern LLMs to assess whether they encode question temporality explicitly (via verbalized judgments) or implicitly (via uncertainty signals). We also train EG-E5, a lightweight multilingual classifier that achieves SoTA performance on this task. Finally, we demonstrate the practical utility of evergreen classification across three applications: improving self-knowledge estimation, filtering QA datasets, and explaining GPT-4o’s retrieval behavior.

pdf bib
LLM-Independent Adaptive RAG: Let the Question Speak for Itself
Maria Marina | Nikolay Ivanov | Sergey Pletenev | Mikhail Salnikov | Daria Galimzianova | Nikita Krayko | Vasily Konovalov | Alexander Panchenko | Viktor Moskvoretskii
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) are prone to hallucinations, and Retrieval-Augmented Generation (RAG) helps mitigate this, but at a high computational cost while risking misinformation. Adaptive retrieval aims to retrieve only when necessary, but existing approaches rely on LLM-based uncertainty estimation, which remains inefficient and impractical.In this study, we introduce lightweight LLM-independent adaptive retrieval methods based on external information. We investigated 27 features, organized into 7 groups, and their hybrid combinations. We evaluated these methods on 6 QA datasets, assessing the QA performance and efficiency. The results show that our approach matches the performance of complex LLM-based methods while achieving significant efficiency gains, demonstrating the potential of external information for adaptive retrieval.

2024

pdf bib
TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks
Viktor Moskvoretskii | Ekaterina Neminova | Alina Lobanova | Alexander Panchenko | Irina Nikishina
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we explore the capabilities of LLMs in capturing lexical-semantic knowledge from WordNet on the example of the LLaMA-2-7b model and test it on multiple lexical semantic tasks. As the outcome of our experiments, we present TaxoLLaMA, the “all-in-one” model for taxonomy-related tasks, lightweight due to 4-bit quantization and LoRA. TaxoLLaMA achieves 11 SOTA results, and 4 top-2 results out of 16 tasks on the Taxonomy Enrichment, Hypernym Discovery, Taxonomy Construction, and Lexical Entailment tasks. Moreover, it demonstrates a very strong zero-shot performance on Lexical Entailment and Taxonomy Construction with no fine-tuning. We also explore its hidden multilingual and domain adaptation capabilities with a little tuning or few-shot learning. All datasets, code, and pre-trained models are available online (code: https://github.com/VityaVitalich/TaxoLLaMA)

pdf bib
Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Viktor Moskvoretskii | Nazarii Tupitsa | Chris Biemann | Samuel Horváth | Eduard Gorbunov | Irina Nikishina
Findings of the Association for Computational Linguistics: EMNLP 2024

We present a new approach called MeritOpt based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the datasets of South East Asian and Finno-Ugric languages. In addition to its effectiveness, MeritOpt is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments (https://github.com/VityaVitalich/MeritOpt).

pdf bib
Are Large Language Models Good at Lexical Semantics? A Case of Taxonomy Learning
Viktor Moskvoretskii | Alexander Panchenko | Irina Nikishina
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recent studies on LLMs do not pay enough attention to linguistic and lexical semantic tasks, such as taxonomy learning. In this paper, we explore the capacities of Large Language Models featuring LLaMA-2 and Mistral for several Taxonomy-related tasks. We introduce a new methodology and algorithm for data collection via stochastic graph traversal leading to controllable data collection. Collected cases provide the ability to form nearly any type of graph operation. We test the collected dataset for learning taxonomy structure based on English WordNet and compare different input templates for fine-tuning LLMs. Moreover, we apply the fine-tuned models on such datasets on the downstream tasks achieving state-of-the-art results on the TexEval-2 dataset.