Yulong Chen


2022

pdf
Graph Pre-training for AMR Parsing and Generation
Xuefeng Bai | Yulong Chen | Yue Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Abstract meaning representation (AMR) highlights the core semantic information of text in a graph structure.Recently, pre-trained language models (PLMs) have advanced tasks of AMR parsing and AMR-to-text generation, respectively.However, PLMs are typically pre-trained on textual data, thus are sub-optimal for modeling structural knowledge.To this end, we investigate graph self-supervised training to improve the structure awareness of PLMs over AMR graphs.In particular, we introduce two graph auto-encoding strategies for graph-to-graph pre-training and four tasks to integrate text and graph information during pre-training.We further design a unified framework to bridge the gap between pre-training and fine-tuning tasks.Experiments on both AMR parsing and AMR-to-text generation show the superiority of our model.To our knowledge, we are the first to consider pre-training on semantic graphs.

pdf
Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect
Naihao Deng | Yulong Chen | Yue Zhang
Proceedings of the 29th International Conference on Computational Linguistics

Text-to-SQL has attracted attention from both the natural language processing and database communities because of its ability to convert the semantics in natural language into SQL queries and its practical application in building natural language interfaces to database systems. The major challenges in text-to-SQL lie in encoding the meaning of natural utterances, decoding to SQL queries, and translating the semantics between these two forms. These challenges have been addressed to different extents by the recent advances. However, there is still a lack of comprehensive surveys for this task. To this end, we review recent progress on text-to-SQL for datasets, methods, and evaluation and provide this systematic survey, addressing the aforementioned challenges and discussing potential future directions. We hope this survey can serve as quick access to existing work and motivate future research.

2021

pdf
DialogSum Challenge: Summarizing Real-Life Scenario Dialogues
Yulong Chen | Yang Liu | Yue Zhang
Proceedings of the 14th International Conference on Natural Language Generation

We propose a shared task on summarizing real-life scenario dialogues, DialogSum Challenge, to encourage researchers to address challenges in dialogue summarization, which has been less studied by the summarization community. Real-life scenario dialogue summarization has a wide potential application prospect in chat-bot and personal assistant. It contains unique challenges such as special discourse structure, coreference, pragmatics, and social common sense, which require specific representation learning technologies to deal with. We carefully annotate a large-scale dialogue summarization dataset based on multiple public dialogue corpus, opening the door to all kinds of summarization models.

pdf
DialogSum: A Real-Life Scenario Dialogue Summarization Dataset
Yulong Chen | Yang Liu | Liang Chen | Yue Zhang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Semantic Representation for Dialogue Modeling
Xuefeng Bai | Yulong Chen | Linfeng Song | Yue Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Although neural models have achieved competitive results in dialogue systems, they have shown limited ability in representing core semantics, such as ignoring important entities. To this end, we exploit Abstract Meaning Representation (AMR) to help dialogue modeling. Compared with the textual input, AMR explicitly provides core semantic knowledge and reduces data sparsity. We develop an algorithm to construct dialogue-level AMR graphs from sentence-level AMRs and explore two ways to incorporate AMRs into dialogue systems. Experimental results on both dialogue understanding and response generation tasks show the superiority of our model. To our knowledge, we are the first to leverage a formal semantic representation into neural dialogue modeling.

pdf
On Compositional Generalization of Neural Machine Translation
Yafu Li | Yongjing Yin | Yulong Chen | Yue Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Modern neural machine translation (NMT) models have achieved competitive performance in standard benchmarks such as WMT. However, there still exist significant issues such as robustness, domain generalization, etc. In this paper, we study NMT models from the perspective of compositional generalization by building a benchmark dataset, CoGnition, consisting of 216k clean and consistent sentence pairs. We quantitatively analyze effects of various factors using compound translation error rate, then demonstrate that the NMT model fails badly on compositional generalization, although it performs remarkably well under traditional metrics.

2020

pdf
Investigating Rich Feature Sources for Conceptual Representation Encoding
Lu Cao | Yulong Chen | Dandan Huang | Yue Zhang
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

Functional Magnetic Resonance Imaging (fMRI) provides a means to investigate human conceptual representation in cognitive and neuroscience studies, where researchers predict the fMRI activations with elicited stimuli inputs. Previous work mainly uses a single source of features, particularly linguistic features, to predict fMRI activations. However, relatively little work has been done on investigating rich-source features for conceptual representation. In this paper, we systematically compare the linguistic, visual as well as auditory input features in conceptual representation, and further introduce associative conceptual features, which are obtained from Small World of Words game, to predict fMRI activations. Our experimental results show that those rich-source features can enhance performance in predicting the fMRI activations. Our analysis indicates that information from rich sources is present in the conceptual representation of human brains. In particular, the visual feature weights the most on conceptual representation, which is consistent with the recent cognitive science study.