Curea Eric
2017
Document retrieval and question answering in medical documents. A large-scale corpus challenge.
Curea Eric
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Curea Eric
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations.