Bilingual Dictionary Construction with Transliteration Filtering

John Richardson, Toshiaki Nakazawa, Sadao Kurohashi


Abstract
In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine translation approach. We demonstrate that the extracted dictionary is accurate and of high recall (F1 score 0.8). Our lexicon contains not only single words but also multi-word expressions, and is freely available. Our experiments focus on Katakana-English lexicon construction, however it would be possible to apply the proposed methods to transliteration extraction for a variety of language pairs.
Anthology ID:
L14-1016
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1013–1017
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/102_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
John Richardson, Toshiaki Nakazawa, and Sadao Kurohashi. 2014. Bilingual Dictionary Construction with Transliteration Filtering. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1013–1017, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Bilingual Dictionary Construction with Transliteration Filtering (Richardson et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/102_Paper.pdf