Abstract
We evaluate several orthographic word similarity measures in the context of bitext word alignment. We investigate the relationship between the length of the words and the length of their longest common subsequence. We present an alternative to the longest common subsequence ratio (LCSR), a widely-used orthographic word similarity measure. Experiments involving identification of cognates in bitexts suggest that the alternative method outperforms LCSR. Our results also indicate that alignment links can be used as a substitute for cognates for the purpose of evaluating word similarity measures.- Anthology ID:
- 2005.mtsummit-papers.40
- Volume:
- Proceedings of Machine Translation Summit X: Papers
- Month:
- September 13-15
- Year:
- 2005
- Address:
- Phuket, Thailand
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 305–312
- Language:
- URL:
- https://aclanthology.org/2005.mtsummit-papers.40
- DOI:
- Cite (ACL):
- Grzegorz Kondrak. 2005. Cognates and Word Alignment in Bitexts. In Proceedings of Machine Translation Summit X: Papers, pages 305–312, Phuket, Thailand.
- Cite (Informal):
- Cognates and Word Alignment in Bitexts (Kondrak, MTSummit 2005)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2005.mtsummit-papers.40.pdf