Cross-Lingual Training for Automatic Question Generation

Vishwajeet Kumar, Nitish Joshi, Arijit Mukherjee, Ganesh Ramakrishnan, Preethi Jyothi


Abstract
Automatic question generation (QG) is a challenging problem in natural language understanding. QG systems are typically built assuming access to a large number of training instances where each instance is a question and its corresponding answer. For a new language, such training instances are hard to obtain making the QG problem even more challenging. Using this as our motivation, we study the reuse of an available large QG dataset in a secondary language (e.g. English) to learn a QG model for a primary language (e.g. Hindi) of interest. For the primary language, we assume access to a large amount of monolingual text but only a small QG dataset. We propose a cross-lingual QG model which uses the following training regime: (i) Unsupervised pretraining of language models in both primary and secondary languages and (ii) joint supervised training for QG in both languages. We demonstrate the efficacy of our proposed approach using two different primary languages, Hindi and Chinese. Our proposed framework clearly outperforms a number of baseline models, including a fully-supervised transformer-based model trained on the QG datasets in the primary language. We also create and release a new question answering dataset for Hindi consisting of 6555 sentences.
Anthology ID:
P19-1481
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4863–4872
Language:
URL:
https://aclanthology.org/P19-1481
DOI:
10.18653/v1/P19-1481
Bibkey:
Cite (ACL):
Vishwajeet Kumar, Nitish Joshi, Arijit Mukherjee, Ganesh Ramakrishnan, and Preethi Jyothi. 2019. Cross-Lingual Training for Automatic Question Generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4863–4872, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Training for Automatic Question Generation (Kumar et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/P19-1481.pdf
Video:
 https://vimeo.com/385264949
Code
 vishwajeet93/clqg
Data
DuReaderSQuAD