Abstract
This paper proposes to tackle open-domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.- Anthology ID:
- P17-1171
- Volume:
- Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Regina Barzilay, Min-Yen Kan
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1870–1879
- Language:
- URL:
- https://aclanthology.org/P17-1171
- DOI:
- 10.18653/v1/P17-1171
- Cite (ACL):
- Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Reading Wikipedia to Answer Open-Domain Questions (Chen et al., ACL 2017)
- PDF:
- https://preview.aclanthology.org/landing_page/P17-1171.pdf
- Code
- facebookresearch/DrQA + additional community code
- Data
- CBT, DBpedia, Natural Questions, QUASAR-T, SQuAD, SearchQA, WikiMovies