Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards

Baban Gain, Ramakrishna Appicharla, Soumya Chennabasavaraj, Nikesh Garera, Asif Ekbal, Muthusamy Chelliah


Abstract
Community Question-Answering (CQA) portals serve as a valuable tool for helping users within an organization. However, making them accessible to non-English-speaking users continues to be a challenge. Translating questions can broaden the community’s reach, benefiting individuals with similar inquiries in various languages. Translating questions using Neural Machine Translation (NMT) poses more challenges, especially in noisy environments, where the grammatical correctness of the questions is not monitored. These questions may be phrased as statements by non-native speakers, with incorrect subject-verb order and sometimes even missing question marks. Creating a synthetic parallel corpus from such data is also difficult due to its noisy nature. To address this issue, we propose a training methodology that fine-tunes the NMT system only using source-side data. Our approach balances adequacy and fluency by utilizing a loss function that combines BERTScore and Masked Language Model (MLM) Score. Our method surpasses the conventional Maximum Likelihood Estimation (MLE) based fine-tuning approach, which relies on synthetic target data, by achieving a 1.9 BLEU score improvement. Our model exhibits robustness while we add noise to our baseline, and still achieve 1.1 BLEU improvement and large improvements on TER and BLEURT metrics. Our proposed methodology is model-agnostic and is only necessary during the training phase. We make the codes and datasets publicly available at https://www.iitp.ac.in/~ai-nlp-ml/resources.html#DomainAdapt for facilitating further research.
Anthology ID:
2023.findings-emnlp.16
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
207–221
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.16
DOI:
10.18653/v1/2023.findings-emnlp.16
Bibkey:
Cite (ACL):
Baban Gain, Ramakrishna Appicharla, Soumya Chennabasavaraj, Nikesh Garera, Asif Ekbal, and Muthusamy Chelliah. 2023. Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 207–221, Singapore. Association for Computational Linguistics.
Cite (Informal):
Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards (Gain et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2023.findings-emnlp.16.pdf