English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation

Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay, Mihaela Vela, Josef van Genabith


Abstract
We present the first study on the post-editing (PE) effort required to build a parallel dataset for English-Manipuri and English-Mizo, in the context of a project on creating data for machine translation (MT). English source text from a local daily newspaper are machine translated into Manipuri and Mizo using PBSMT systems built in-house. A Computer Assisted Translation (CAT) tool is used to record the time, keystroke and other indicators to measure PE effort in terms of temporal and technical effort. A positive correlation between the technical effort and the number of function words is seen for English-Manipuri and English-Mizo but a negative correlation between the technical effort and the number of noun words for English-Mizo. However, average time spent per token in PE English-Mizo text is negatively correlated with the temporal effort. The main reason for these results are due to (i) English and Mizo using the same script, while Manipuri uses a different script and (ii) the agglutinative nature of Manipuri. Further, we check the impact of training a MT system in an incremental approach, by including the post-edited dataset as additional training data. The result shows an increase in HBLEU of up to 4.6 for English-Manipuri.
Anthology ID:
2020.icon-main.7
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
50–59
Language:
URL:
https://aclanthology.org/2020.icon-main.7
DOI:
Bibkey:
Cite (ACL):
Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay, Mihaela Vela, and Josef van Genabith. 2020. English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 50–59, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation (Sanayai Meetei et al., ICON 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.icon-main.7.pdf