Few-Shot Pidgin Text Adaptation via Contrastive Fine-Tuning

Ernie Chang, Jesujoba O. Alabi, David Ifeoluwa Adelani, Vera Demberg


Abstract
The surging demand for multilingual dialogue systems often requires a costly labeling process for each language addition. For low resource languages, human annotators are continuously tasked with the adaptation of resource-rich language utterances for each new domain. However, this prohibitive and impractical process can often be a bottleneck for low resource languages that are still without proper translation systems nor parallel corpus. In particular, it is difficult to obtain task-specific low resource language annotations for the English-derived creoles (e.g. Nigerian and Cameroonian Pidgin). To address this issue, we utilize the pretrained language models i.e. BART which has shown great potential in language generation/understanding – we propose to finetune the BART model to generate utterances in Pidgin by leveraging the proximity of the source and target languages, and utilizing positive and negative examples in constrastive training objectives. We collected and released the first parallel Pidgin-English conversation corpus in two dialogue domains and showed that this simple and effective technique is suffice to yield impressive results for English-to-Pidgin generation, which are two closely-related languages.
Anthology ID:
2022.coling-1.377
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4286–4291
Language:
URL:
https://aclanthology.org/2022.coling-1.377
DOI:
Bibkey:
Cite (ACL):
Ernie Chang, Jesujoba O. Alabi, David Ifeoluwa Adelani, and Vera Demberg. 2022. Few-Shot Pidgin Text Adaptation via Contrastive Fine-Tuning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4286–4291, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Few-Shot Pidgin Text Adaptation via Contrastive Fine-Tuning (Chang et al., COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.coling-1.377.pdf