Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri

Isaac Feldman, Rolando Coto-Solano


Abstract
This paper presents a neural machine translation model and dataset for the Chibchan language Bribri, with an average performance of BLEU 16.9±1.7. This was trained on an extremely small dataset (5923 Bribri-Spanish pairs), providing evidence for the applicability of NMT in extremely low-resource environments. We discuss the challenges entailed in managing training input from languages without standard orthographies, we provide evidence of successful learning of Bribri grammar, and also examine the translations of structures that are infrequent in major Indo-European languages, such as positional verbs, ergative markers, numerical classifiers and complex demonstrative systems. In addition to this, we perform an experiment of augmenting the dataset through iterative back-translation (Sennrich et al., 2016a; Hoang et al., 2018) by using Spanish sentences to create synthetic Bribri sentences. This improves the score by an average of 1.0 BLEU, but only when the new Spanish sentences belong to the same domain as the other Spanish examples. This contributes to the small but growing body of research on Chibchan NLP.
Anthology ID:
2020.coling-main.351
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3965–3976
Language:
URL:
https://aclanthology.org/2020.coling-main.351
DOI:
10.18653/v1/2020.coling-main.351
Bibkey:
Cite (ACL):
Isaac Feldman and Rolando Coto-Solano. 2020. Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3965–3976, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri (Feldman & Coto-Solano, COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.351.pdf
Code
 rolandocoto/bribri-coling2020