Abstract
Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations.- Anthology ID:
- W17-8001
- Volume:
- Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
- Month:
- September
- Year:
- 2017
- Address:
- Varna, Bulgaria
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 1–7
- Language:
- URL:
- https://doi.org/10.26615/978-954-452-044-1_001
- DOI:
- 10.26615/978-954-452-044-1_001
- Cite (ACL):
- Curea Eric. 2017. Document retrieval and question answering in medical documents. A large-scale corpus challenge.. In Proceedings of the Biomedical NLP Workshop associated with RANLP 2017, pages 1–7, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Document retrieval and question answering in medical documents. A large-scale corpus challenge. (Eric, RANLP 2017)
- PDF:
- https://doi.org/10.26615/978-954-452-044-1_001