Discovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation

Els Lefever, Véronique Hoste, Martine De Cock


Abstract
Wikipedia pages typically contain inter-language links to the corresponding pages in other languages. These links, however, are often incomplete. This paper describes a set of experiments in which the viability of discovering such missing inter-language links for ambiguous nouns by means of a cross-lingual Word Sense Disambiguation approach is investigated. The input for the inter-language link detection system is a set of Dutch pages for a given ambiguous noun and the output of the system is a set of links to the corresponding pages in three target languages (viz. French, Spanish and Italian). The experimental results show that although it is a very challenging task, the system succeeds to detect missing inter-language links between Wikipedia documents for a manually labeled test set. The final goal of the system is to provide a human editor with a list of possible missing links that should be manually verified.
Anthology ID:
L12-1278
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
841–846
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/508_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Els Lefever, Véronique Hoste, and Martine De Cock. 2012. Discovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 841–846, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Discovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation (Lefever et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/508_Paper.pdf