Abstract
Amis is an endangered language indigenous to Taiwan with limited data available for computational processing. We thus present an Amis-Mandarin dataset containing a parallel corpus of 5,751 Amis and Mandarin sentences and a dictionary of 7,800 Amis words and phrases with their definitions in Mandarin. Using our dataset, we also established a baseline for machine translation between Amis and Mandarin in both directions. Our dataset can be found at https://github.com/francisdzheng/amis-mandarin.- Anthology ID:
- 2022.nlp4dh-1.11
- Original:
- 2022.nlp4dh-1.11v1
- Version 2:
- 2022.nlp4dh-1.11v2
- Volume:
- Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities
- Month:
- November
- Year:
- 2022
- Address:
- Taipei, Taiwan
- Editors:
- Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
- Venue:
- NLP4DH
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 79–84
- Language:
- URL:
- https://aclanthology.org/2022.nlp4dh-1.11
- DOI:
- Cite (ACL):
- Francis Zheng, Edison Marrese-Taylor, and Yutaka Matsuo. 2022. A Parallel Corpus and Dictionary for Amis-Mandarin Translation. In Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities, pages 79–84, Taipei, Taiwan. Association for Computational Linguistics.
- Cite (Informal):
- A Parallel Corpus and Dictionary for Amis-Mandarin Translation (Zheng et al., NLP4DH 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2022.nlp4dh-1.11.pdf