Beyond [CLS] through Ranking by Generation
Cicero Nogueira dos Santos, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, Bing Xiang
Abstract
Generative models for Information Retrieval, where ranking of documents is viewed as the task of generating a query from a document’s language model, were very successful in various IR tasks in the past. However, with the advent of modern deep neural networks, attention has shifted to discriminative ranking functions that model the semantic similarity of documents and queries instead. Recently, deep generative models such as GPT2 and BART have been shown to be excellent text generators, but their effectiveness as rankers have not been demonstrated yet. In this work, we revisit the generative framework for information retrieval and show that our generative approaches are as effective as state-of-the-art semantic similarity-based discriminative models for the answer selection task. Additionally, we demonstrate the effectiveness of unlikelihood losses for IR.- Anthology ID:
- 2020.emnlp-main.134
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1722–1727
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.134
- DOI:
- 10.18653/v1/2020.emnlp-main.134
- Cite (ACL):
- Cicero Nogueira dos Santos, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, and Bing Xiang. 2020. Beyond [CLS] through Ranking by Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1722–1727, Online. Association for Computational Linguistics.
- Cite (Informal):
- Beyond [CLS] through Ranking by Generation (Nogueira dos Santos et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.134.pdf
- Data
- InsuranceQA, WikiQA