Optimizing LLM Based Retrieval Augmented Generation Pipelines in the Financial Domain

Yiyun Zhao, Prateek Singh, Hanoz Bhathena, Bernardo Ramos, Aviral Joshi, Swaroop Gadiyaram, Saket Sharma


Abstract
Retrieval Augmented Generation (RAG) is a prominent approach in real-word applications for grounding large language model (LLM) generations in up to date and domain-specific knowledge. However, there is a lack of systematic investigations of the impact of each component (retrieval quality, prompts, generation models) on the generation quality of a RAG pipeline in real world scenarios. In this study, we benchmark 6 LLMs in 15 retrieval scenarios exploring 9 prompts over 2 real world financial domain datasets. We thoroughly discuss the impact of each component in RAG pipeline on answer generation quality and formulate specific recommendations for the design of RAG systems.
Anthology ID:
2024.naacl-industry.23
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yi Yang, Aida Davani, Avi Sil, Anoop Kumar
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
279–294
Language:
URL:
https://aclanthology.org/2024.naacl-industry.23
DOI:
10.18653/v1/2024.naacl-industry.23
Bibkey:
Cite (ACL):
Yiyun Zhao, Prateek Singh, Hanoz Bhathena, Bernardo Ramos, Aviral Joshi, Swaroop Gadiyaram, and Saket Sharma. 2024. Optimizing LLM Based Retrieval Augmented Generation Pipelines in the Financial Domain. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), pages 279–294, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Optimizing LLM Based Retrieval Augmented Generation Pipelines in the Financial Domain (Zhao et al., NAACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.naacl-industry.23.pdf