Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries

Xinkai Wang, Paul Thompson, Jun’ichi Tsujii, Sophia Ananiadou


Abstract
Cross-lingual information retrieval (CLIR) involving the Chinese language has been thoroughly studied in the general language domain, but rarely in the biomedical domain, due to the lack of suitable linguistic resources and parsing tools. In this paper, we describe a Chinese-English CLIR system for biomedical literature, which exploits a bilingual ontology, the ``eCMeSH Tree"""". This is an extension of the Chinese Medical Subject Headings (CMeSH) Tree, based on Medical Subject Headings (MeSH). Using the 2006 and 2007 TREC Genomics track data, we have evaluated the performance of the eCMeSH Tree in expanding queries. We have compared our results to those obtained using two other approaches, i.e. pseudo-relevance feedback (PRF) and document translation (DT). Subsequently, we evaluate the performance of different combinations of these three retrieval methods. Our results show that our method of expanding queries using the eCMeSH Tree can outperform the PRF method. Furthermore, combining this method with PRF and DT helps to smooth the differences in query expansion, and consequently results in the best performance amongst all experiments reported. All experiments compare the use of two different retrieval models, i.e. Okapi BM25 and a query likelihood language model. In general, the former performs slightly better.
Anthology ID:
L12-1149
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1148–1155
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/316_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Xinkai Wang, Paul Thompson, Jun’ichi Tsujii, and Sophia Ananiadou. 2012. Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1148–1155, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries (Wang et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/316_Paper.pdf