Lidiya Murakhovs’ka


Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation
Philippe Laban | Chien-Sheng Wu | Lidiya Murakhovs’ka | Wenhao Liu | Caiming Xiong
Findings of the Association for Computational Linguistics: NAACL 2022

Question generation (QGen) models are often evaluated with standardized NLG metrics that are based on n-gram overlap.In this paper, we measure whether these metric improvements translate to gains in a practical setting, focusing on the use case of helping teachers automate the generation of reading comprehension quizzes. In our study, teachers building a quiz receive question suggestions, which they can either accept or refuse with a reason. Even though we find that recent progress in QGen leads to a significant increase in question acceptance rates, there is still large room for improvement, with the best model having only 68.4% of its questions accepted by the ten teachers who participated in our study. We then leverage the annotations we collected to analyze standard NLG metrics and find that model performance has reached projected upper-bounds, suggesting new automatic metrics are needed to guide QGen research forward.

MixQG: Neural Question Generation with Mixed Answer Types
Lidiya Murakhovs’ka | Chien-Sheng Wu | Philippe Laban | Tong Niu | Wenhao Liu | Caiming Xiong
Findings of the Association for Computational Linguistics: NAACL 2022

Asking good questions is an essential ability for both human and machine intelligence. However, existing neural question generation approaches mainly focus on short factoid type of answers. In this paper, we introduce a neural question generator, MixQG, to bridge this gap. We combine nine question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstractive answers, to train a single generative model. We show with empirical results that our model outperforms existing work in both seen and unseen domains, and can generate questions with different cognitive levels when conditioned on different answer types. We run a human evaluation study to assess the quality of generated questions and find that MixQG outperforms the next best model by 10%. Our code and model checkpoints will be released and integrated with the HuggingFace library to facilitate various downstream applications.

Discord Questions: A Computational Approach To Diversity Analysis in News Coverage
Philippe Laban | Chien-Sheng Wu | Lidiya Murakhovs’ka | Xiang Chen | Caiming Xiong
Findings of the Association for Computational Linguistics: EMNLP 2022

There are many potential benefits to news readers accessing diverse sources. Modern news aggregators do the hard work of organizing the news, offering readers a plethora of source options, but choosing which source to read remains challenging.We propose a new framework to assist readers in identifying source differences and gaining an understanding of news coverage diversity.The framework is based on the generation of Discord Questions: questions with a diverse answer pool, explicitly illustrating source differences.To assemble a prototype of the framework, we focus on two components: (1) discord question generation, the task of generating questions answered differently by sources, for which we propose an automatic scoring method, and create a model that improves performance from current question generation (QG) methods by 5%, (2) answer consolidation, the task of grouping answers to a question that are semantically similar, for which we collect data and repurpose a method that achieves 81% balanced accuracy on our realistic test set.We illustrate the framework’s feasibility through a prototype interface. Even though model performance at discord QG still lags human performance by more than 15%, generated questions are judged to be more interesting than factoid questions and can reveal differences in the level of detail, sentiment, and reasoning of sources in news coverage. Code is available at