Lingyu Gao


2022

pdf
How do we get there? Evaluating transformer neural networks as cognitive models for English past tense inflection
Xiaomeng Ma | Lingyu Gao
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

There is an ongoing debate of whether neural network can grasp the quasi-regularities in languages like humans. In a typical quasi-regularity task, English past tense inflections, the neural network model has long been criticized that it learns only to generalize the most frequent pattern, but not the regular pattern, thus can not learn the abstract categories of regular and irregular and is dissimilar to human performance. In this work, we train a set of transformer models with different settings to examine their behavior on this task. The models achieved high accuracy on unseen regular verbs and some accuracy on unseen irregular verbs. The models’ performance on the regulars are heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars. The different behaviors on the regulars and irregulars suggest that the models have some degree of symbolic learning on the regularity of the verbs. In addition, the models are weakly correlated with human behavior on nonce verbs. Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well suggesting that it might not be a good cognitive model.

pdf
“What makes a question inquisitive?” A Study on Type-Controlled Inquisitive Question Generation
Lingyu Gao | Debanjan Ghosh | Kevin Gimpel
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

We propose a type-controlled framework for inquisitive question generation. We annotate an inquisitive question dataset with question types, train question type classifiers, and finetune models for type-controlled question generation. Empirical results demonstrate that we can generate a variety of questions that adhere to specific types while drawing from the source texts. We also investigate strategies for selecting a single question from a generated set, considering both an informative vs. inquisitive question classifier and a pairwise ranker trained from a small set of expert annotations. Question selection using the pairwise ranker yields strong results in automatic and manual evaluation. Our human evaluation assesses multiple aspects of the generated questions, finding that the ranker chooses questions with the best syntax (4.59), semantics (4.37), and inquisitiveness (3.92) on a scale of 1-5, even rivaling the performance of human-written questions.

2020

pdf
Distractor Analysis and Selection for Multiple-Choice Cloze Questions for Second-Language Learners
Lingyu Gao | Kevin Gimpel | Arnar Jensson
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

We consider the problem of automatically suggesting distractors for multiple-choice cloze questions designed for second-language learners. We describe the creation of a dataset including collecting manual annotations for distractor selection. We assess the relationship between the choices of the annotators and features based on distractors and the correct answers, both with and without the surrounding passage context in the cloze questions. Simple features of the distractor and correct answer correlate with the annotations, though we find substantial benefit to additionally using large-scale pretrained models to measure the fit of the distractor in the context. Based on these analyses, we propose and train models to automatically select distractors, and measure the importance of model components quantitatively.

pdf
A Cross-Task Analysis of Text Span Representations
Shubham Toshniwal | Haoyue Shi | Bowen Shi | Lingyu Gao | Karen Livescu | Kevin Gimpel
Proceedings of the 5th Workshop on Representation Learning for NLP

Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for representing words and sentences, there is less work on representing arbitrary spans of text within sentences. In this paper, we conduct a comprehensive empirical evaluation of six span representation methods using eight pretrained language representation models across six tasks, including two tasks that we introduce. We find that, although some simple span representations are fairly reliable across tasks, in general the optimal span representation varies by task, and can also vary within different facets of individual tasks. We also find that the choice of span representation has a bigger impact with a fixed pretrained encoder than with a fine-tuned encoder.