@inproceedings{mandikal-2024-ancient,
    title = "Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented {LLM}s for {A}ncient {I}ndian Philosophy",
    author = "Mandikal, Priyanka",
    editor = "Pavlopoulos, John  and
      Sommerschield, Thea  and
      Assael, Yannis  and
      Gordin, Shai  and
      Cho, Kyunghyun  and
      Passarotti, Marco  and
      Sprugnoli, Rachele  and
      Liu, Yudong  and
      Li, Bin  and
      Anderson, Adam",
    booktitle = "Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)",
    month = aug,
    year = "2024",
    address = "Hybrid in Bangkok, Thailand and online",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.ml4al-1.23/",
    doi = "10.18653/v1/2024.ml4al-1.23",
    pages = "224--250",
    abstract = "LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by limitations such as factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. In this work, we explore the potential of retrieval-augmented generation (RAG) models in performing long-form question answering (LFQA) on a specially curated niche and custom knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. A human evaluation involving computational linguists and domain experts, shows that the RAG model significantly outperforms the standard model in producing factual, comprehensive responses having fewer hallucinations. In addition, we find that a keyword-based hybrid retriever that focuses on unique low-frequency words further improves results. Our study provides insights into the future development of real-world RAG models for custom and niche areas of knowledge."
}Markdown (Informal)
[Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy](https://preview.aclanthology.org/ingest-emnlp/2024.ml4al-1.23/) (Mandikal, ML4AL 2024)
ACL