R3 : Refined Retriever-Reader pipeline for Multidoc2dial

Srijan Bansal, Suraj Tripathi, Sumit Agarwal, Sireesh Gururaja, Aditya Srikanth Veerubhotla, Ritam Dutt, Teruko Mitamura, Eric Nyberg


Abstract
In this paper, we present our submission to the DialDoc shared task based on the MultiDoc2Dial dataset. MultiDoc2Dial is a conversational question answering dataset that grounds dialogues in multiple documents. The task involves grounding a user’s query in a document followed by generating an appropriate response. We propose several improvements over the baseline’s retriever-reader architecture to aid in modeling goal-oriented dialogues grounded in multiple documents. Our proposed approach employs sparse representations for passage retrieval, a passage re-ranker, the fusion-in-decoder architecture for generation, and a curriculum learning training paradigm. Our approach shows a 12 point improvement in BLEU score compared to the baseline RAG model.
Anthology ID:
2022.dialdoc-1.17
Volume:
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
dialdoc
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
148–154
Language:
URL:
https://aclanthology.org/2022.dialdoc-1.17
DOI:
10.18653/v1/2022.dialdoc-1.17
Bibkey:
Cite (ACL):
Srijan Bansal, Suraj Tripathi, Sumit Agarwal, Sireesh Gururaja, Aditya Srikanth Veerubhotla, Ritam Dutt, Teruko Mitamura, and Eric Nyberg. 2022. R3 : Refined Retriever-Reader pipeline for Multidoc2dial. In Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, pages 148–154, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
R3 : Refined Retriever-Reader pipeline for Multidoc2dial (Bansal et al., dialdoc 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2022.dialdoc-1.17.pdf
Data
CoQADoc2DialMS MARCOMultiDoc2DialNatural QuestionsQuACdoc2dial