Ying Xu


Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension
Ying Xu | Dakuo Wang | Mo Yu | Daniel Ritchie | Bingsheng Yao | Tongshuang Wu | Zheng Zhang | Toby Li | Nora Bradford | Branda Sun | Tran Hoang | Yisi Sang | Yufang Hou | Xiaojuan Ma | Diyi Yang | Nanyun Peng | Zhou Yu | Mark Warschauer
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative comprehension of kindergarten to eighth-grade students. Generated by educational experts based on an evidence-based theoretical framework, FairytaleQA consists of 10,580 explicit and implicit questions derived from 278 children-friendly stories, covering seven types of narrative elements or relations. Our dataset is valuable in two folds: First, we ran existing QA models on our dataset and confirmed that this annotation helps assess models’ fine-grained learning skills. Second, the dataset supports question generation (QG) task in the education domain. Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.

It is AI’s Turn to Ask Humans a Question: Question-Answer Pair Generation for Children’s Story Books
Bingsheng Yao | Dakuo Wang | Tongshuang Wu | Zheng Zhang | Toby Li | Mo Yu | Ying Xu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing question answering (QA) techniques are created mainly to answer questions asked by humans. But in educational applications, teachers often need to decide what questions they should ask, in order to help students to improve their narrative understanding capabilities. We design an automated question-answer generation (QAG) system for this education scenario: given a story book at the kindergarten to eighth-grade level as input, our system can automatically generate QA pairs that are capable of testing a variety of dimensions of a student’s comprehension skills. Our proposed QAG model architecture is demonstrated using a new expert-annotated FairytaleQA dataset, which has 278 child-friendly storybooks with 10,580 QA pairs. Automatic and human evaluations show that our model outperforms state-of-the-art QAG baseline systems. On top of our QAG system, we also start to build an interactive story-telling application for the future real-world deployment in this educational scenario.


Grey-box Adversarial Attack And Defence For Sentiment Classification
Ying Xu | Xu Zhong | Antonio Jimeno Yepes | Jey Han Lau
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We introduce a grey-box adversarial attack and defence framework for sentiment classification. We address the issues of differentiability, label preservation and input reconstruction for adversarial attack and defence in one unified framework. Our results show that once trained, the attacking model is capable of generating high-quality adversarial examples substantially faster (one order of magnitude less in time) than state-of-the-art attacking methods. These examples also preserve the original sentiment according to human evaluation. Additionally, our framework produces an improved classifier that is robust in defending against multiple adversarial attacking methods. Code is available at: https://github.com/ibm-aur-nlp/adv-def-text-dist.


pdf bib
Decoupling Encoder and Decoder Networks for Abstractive Document Summarization
Ying Xu | Jey Han Lau | Timothy Baldwin | Trevor Cohn
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

Abstractive document summarization seeks to automatically generate a summary for a document, based on some abstract “understanding” of the original document. State-of-the-art techniques traditionally use attentive encoder–decoder architectures. However, due to the large number of parameters in these models, they require large training datasets and long training times. In this paper, we propose decoupling the encoder and decoder networks, and training them separately. We encode documents using an unsupervised document encoder, and then feed the document vector to a recurrent neural network decoder. With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time. Experiments show that the decoupled model achieves comparable performance with state-of-the-art models for in-domain documents, but less well for out-of-domain documents.


Paraphrase for Open Question Answering: New Dataset and Methods
Ying Xu | Pascual Martínez-Gómez | Yusuke Miyao | Randy Goebel
Proceedings of the Workshop on Human-Computer Question Answering


Multiple System Combination for Transliteration
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Adam St Arnaud | Ying Xu | Lei Yao | Grzegorz Kondrak
Proceedings of the Fifth Named Entity Workshop

A Lexicalized Tree Kernel for Open Information Extraction
Ying Xu | Christoph Ringlstetter | Mi-Young Kim | Grzegorz Kondrak | Randy Goebel | Yusuke Miyao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)


Open Information Extraction with Tree Kernels
Ying Xu | Mi-Young Kim | Kevin Quinn | Randy Goebel | Denilson Barbosa
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


Application of the Tightness Continuum Measure to Chinese Information Retrieval
Ying Xu | Randy Goebel | Christoph Ringlstetter | Grzegorz Kondrak
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications


A Continuum-Based Approach for Tightness Analysis of Chinese Semantic Units
Ying Xu | Christoph Ringlstetter | Randy Goebel
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2