Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin


Abstract
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the multi-round TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the best systems. In round 3, we submitted the highest-scoring run that took advantage of previous training data and the second-highest fully automatic run. In rounds 4 and 5, we submitted the highest-scoring fully automatic runs.
Anthology ID:
2020.sdp-1.5
Volume:
Proceedings of the First Workshop on Scholarly Document Processing
Month:
November
Year:
2020
Address:
Online
Editors:
Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31–41
Language:
URL:
https://aclanthology.org/2020.sdp-1.5
DOI:
10.18653/v1/2020.sdp-1.5
Bibkey:
Cite (ACL):
Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, and Jimmy Lin. 2020. Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset. In Proceedings of the First Workshop on Scholarly Document Processing, pages 31–41, Online. Association for Computational Linguistics.
Cite (Informal):
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset (Zhang et al., sdp 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2020.sdp-1.5.pdf
Video:
 https://slideslive.com/38940714
Data
MS MARCO