Xinyu Xing


2021

pdf
Structure-Aware Pre-Training for Table-to-Text Generation
Xinyu Xing | Xiaojun Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf
Automatic Generation of Citation Texts in Scholarly Papers: A Pilot Study
Xinyu Xing | Xiaosheng Fan | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we study the challenging problem of automatic generation of citation texts in scholarly papers. Given the context of a citing paper A and a cited paper B, the task aims to generate a short text to describe B in the given context of A. One big challenge for addressing this task is the lack of training data. Usually, explicit citation texts are easy to extract, but it is not easy to extract implicit citation texts from scholarly papers. We thus first train an implicit citation extraction model based on BERT and leverage the model to construct a large training dataset for the citation text generation task. Then we propose and train a multi-source pointer-generator network with cross attention mechanism for citation text generation. Empirical evaluation results on a manually labeled test dataset verify the efficacy of our model. This pilot study confirms the feasibility of automatically generating citation texts in scholarly papers and the technique has the great potential to help researchers prepare their scientific papers.

2019

pdf
Asking the Crowd: Question Analysis, Evaluation and Generation for Open Discussion on Online Forums
Zi Chai | Xinyu Xing | Xiaojun Wan | Bo Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Teaching machines to ask questions is an important yet challenging task. Most prior work focused on generating questions with fixed answers. As contents are highly limited by given answers, these questions are often not worth discussing. In this paper, we take the first step on teaching machines to ask open-answered questions from real-world news for open discussion (openQG). To generate high-qualified questions, effective ways for question evaluation are required. We take the perspective that the more answers a question receives, the better it is for open discussion, and analyze how language use affects the number of answers. Compared with other factors, e.g. topic and post time, linguistic factors keep our evaluation from being domain-specific. We carefully perform variable control on 11.5M questions from online forums to get a dataset, OQRanD, and further perform question analysis. Based on these conclusions, several models are built for question evaluation. For openQG task, we construct OQGenD, the first dataset as far as we know, and propose a model based on conditional generative adversarial networks and our question evaluation model. Experiments show that our model can generate questions with higher quality compared with commonly-used text generation methods.