Abstract
In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine translation approach. We demonstrate that the extracted dictionary is accurate and of high recall (F1 score 0.8). Our lexicon contains not only single words but also multi-word expressions, and is freely available. Our experiments focus on Katakana-English lexicon construction, however it would be possible to apply the proposed methods to transliteration extraction for a variety of language pairs.- Anthology ID:
- L14-1016
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1013–1017
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/102_Paper.pdf
- DOI:
- Cite (ACL):
- John Richardson, Toshiaki Nakazawa, and Sadao Kurohashi. 2014. Bilingual Dictionary Construction with Transliteration Filtering. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1013–1017, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Bilingual Dictionary Construction with Transliteration Filtering (Richardson et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/102_Paper.pdf