Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Abhilash Nandy; Soumya Sharma; Shubham Maddhashiya; Kapil Sachdeva; Pawan Goyal; Niloy Ganguly

doi:10.18653/v1/2021.findings-emnlp.392

Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Abhilash Nandy, Soumya Sharma, Shubham Maddhashiya, Kapil Sachdeva, Pawan Goyal, NIloy Ganguly

Abstract

Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper, we meticulously create a large amount of data connected with E-manuals and develop a suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals, and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/EMNLP-2021-Findings, and the corresponding project website is https://sites.google.com/view/emanualqa/home.

Anthology ID:: 2021.findings-emnlp.392
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4600–4609
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.392
DOI:: 10.18653/v1/2021.findings-emnlp.392
Bibkey:
Cite (ACL):: Abhilash Nandy, Soumya Sharma, Shubham Maddhashiya, Kapil Sachdeva, Pawan Goyal, and NIloy Ganguly. 2021. Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4600–4609, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework (Nandy et al., Findings 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2021.findings-emnlp.392.pdf
Software:: 2021.findings-emnlp.392.Software.zip
Video:: https://preview.aclanthology.org/ingestion-script-update/2021.findings-emnlp.392.mp4
Code: abhi1nandy2/emnlp-2021-findings
Data: E-Manual Corpus, TechQA

PDF Search Code Software Video