Sezen Perçin
2025
Investigating the Robustness of Retrieval-Augmented Generation at the Query Level
Sezen Perçin
|
Xin Su
|
Qutub Sha Syed
|
Phillip Howard
|
Aleksei Kuvshinov
|
Leo Schwinn
|
Kay-Ulrich Scholl
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Large language models (LLMs) are very costly and inefficient to update with new information. To address this limitation, retrieval-augmented generation (RAG) has been proposed as a solution that dynamically incorporates external knowledge during inference, improving factual consistency and reducing hallucinations. Despite its promise, RAG systems face practical challenges-most notably, a strong dependence on the quality of the input query for accurate retrieval. In this paper, we investigate the sensitivity of different components in the RAG pipeline to various types of query perturbations. Our analysis reveals that the performance of commonly used retrievers can degrade significantly even under minor query variations. We study each module in isolation as well as their combined effect in an end-to-end question answering setting, using both general-domain and domain-specific datasets. Additionally, we propose an evaluation framework to systematically assess the query-level robustness of RAG pipelines and offer actionable recommendations for practitioners based on the results of more than 1092 experiments we performed.
2022
Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts
Sezen Perçin
|
Andrea Galassi
|
Francesca Lagioia
|
Federico Ruggeri
|
Piera Santin
|
Giovanni Sartor
|
Paolo Torroni
Proceedings of the Natural Legal Language Processing Workshop 2022
Creating balanced labeled textual corpora for complex tasks, like legal analysis, is a challenging and expensive process that often requires the collaboration of domain experts. To address this problem, we propose a data augmentation method based on the combination of GloVe word embeddings and the WordNet ontology. We present an example of application in the legal domain, specifically on decisions of the Court of Justice of the European Union.Our evaluation with human experts confirms that our method is more robust than the alternatives.
Search
Fix author
Co-authors
- Andrea Galassi 1
- Phillip Howard 1
- Aleksei Kuvshinov 1
- Francesca Lagioia 1
- Federico Ruggeri 1
- show all...