Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR
Abstract
This paper deals with the main problems that arise in the query translation process in dictionary-based Cross-lingual Information Retrieval (CLIR): translation selection, presence of Out-Of-Vocabulary (OOV) terms and translation of Multi-Word Expressions (MWE). We analyse to what extent each problem affects the retrieval performance for the Basque-English pair of languages, and the improvement obtained when using parallel corpora free methods to address them. To tackle the translation selection problem we provide novel extensions of an already existing monolingual target co-occurrence-based method, the Out-Of Vocabulary terms are dealt with by means of a cognate detection-based method and finally, for the Multi-Word Expression translation problem, a naïve matching technique is applied. The error analysis shows significant differences in the deterioration of the performance depending on the problem, in terms of Mean Average Precision (MAP), the translation selection problem being the cause of most of the errors. Otherwise, the proposed combined strategy shows a good performance to tackle the three above-mentioned main problems.- Anthology ID:
- L10-1032
- Volume:
- Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
- Month:
- May
- Year:
- 2010
- Address:
- Valletta, Malta
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/63_Paper.pdf
- DOI:
- Cite (ACL):
- Xabier Saralegi and Maddalen Lopez de Lacalle. 2010. Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
- Cite (Informal):
- Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR (Saralegi & Lopez de Lacalle, LREC 2010)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/63_Paper.pdf