Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task

Marcel Bollmann, Rahul Aralikatte, Héctor Murrieta Bello, Daniel Hershcovich, Miryam de Lhoneux, Anders Søgaard


Abstract
We evaluated a range of neural machine translation techniques developed specifically for low-resource scenarios. Unsuccessfully. In the end, we submitted two runs: (i) a standard phrase-based model, and (ii) a random babbling baseline using character trigrams. We found that it was surprisingly hard to beat (i), in spite of this model being, in theory, a bad fit for polysynthetic languages; and more interestingly, that (ii) was better than several of the submitted systems, highlighting how difficult low-resource machine translation for polysynthetic languages is.
Anthology ID:
2021.americasnlp-1.28
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
248–254
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.28
DOI:
10.18653/v1/2021.americasnlp-1.28
Bibkey:
Cite (ACL):
Marcel Bollmann, Rahul Aralikatte, Héctor Murrieta Bello, Daniel Hershcovich, Miryam de Lhoneux, and Anders Søgaard. 2021. Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 248–254, Online. Association for Computational Linguistics.
Cite (Informal):
Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task (Bollmann et al., AmericasNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.americasnlp-1.28.pdf