Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy

Priyanka Mandikal

doi:10.18653/v1/2024.ml4al-1.23

Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy

Abstract

LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by limitations such as factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. In this work, we explore the potential of retrieval-augmented generation (RAG) models in performing long-form question answering (LFQA) on a specially curated niche and custom knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. A human evaluation involving computational linguists and domain experts, shows that the RAG model significantly outperforms the standard model in producing factual, comprehensive responses having fewer hallucinations. In addition, we find that a keyword-based hybrid retriever that focuses on unique low-frequency words further improves results. Our study provides insights into the future development of real-world RAG models for custom and niche areas of knowledge.

Anthology ID:: 2024.ml4al-1.23
Volume:: Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
Month:: August
Year:: 2024
Address:: Hybrid in Bangkok, Thailand and online
Editors:: John Pavlopoulos, Thea Sommerschield, Yannis Assael, Shai Gordin, Kyunghyun Cho, Marco Passarotti, Rachele Sprugnoli, Yudong Liu, Bin Li, Adam Anderson
Venues:: ML4AL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 224–250
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2024.ml4al-1.23/
DOI:: 10.18653/v1/2024.ml4al-1.23
Bibkey:
Cite (ACL):: Priyanka Mandikal. 2024. Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy. In Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 224–250, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics.
Cite (Informal):: Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy (Mandikal, ML4AL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2024.ml4al-1.23.pdf

PDF Cite Search Fix data