Jean-Pierre Chevallet

2020

pdf abs
WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval Dataset
Jibril Frej | Didier Schwab | Jean-Pierre Chevallet
Proceedings of the Twelfth Language Resources and Evaluation Conference

Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for information retrieval perform poorly on these datasets. These models (e.g. DUET, Conv-KNRM) are trained and evaluated on data collected from commercial search engines not publicly available for academic research which is a problem for reproducibility and the advancement of research. In this paper, we propose WIKIR: an open-source toolkit to automatically build large-scale English information retrieval datasets based on Wikipedia. WIKIR is publicly available on GitHub. We also provide wikIR59k: a large-scale publicly available dataset that contains 59,252 queries and 2,617,003 (query, relevant documents) pairs.

2012

pdf
Constructing Reference Semantic Predictions from Biomedical Knowledge Sources
Demeke Ayele | Jean-Pierre Chevallet | Million Meshesha | Getnet Kassie
Proceedings of COLING 2012

Co-authors

Venues

coling1
lrec1