Abstract
This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.- Anthology ID:
- W16-4614
- Volume:
- Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Toshiaki Nakazawa, Hideya Mino, Chenchen Ding, Isao Goto, Graham Neubig, Sadao Kurohashi, Ir. Hammam Riza, Pushpak Bhattacharyya
- Venue:
- WAT
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 149–156
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/W16-4614/
- DOI:
- Cite (ACL):
- Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2016. Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pages 149–156, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian (Ding et al., WAT 2016)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/W16-4614.pdf