Zhi Chen


2022

pdf
UniDU: Towards A Unified Generative Dialogue Understanding Framework
Zhi Chen | Lu Chen | Bei Chen | Libo Qin | Yuncong Liu | Su Zhu | Jian-Guang Lou | Kai Yu
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

With the development of pre-trained language models, remarkable success has been witnessed in dialogue understanding (DU). However, current DU approaches usually employ independent models for each distinct DU task, without considering shared knowledge across different DU tasks. In this paper, we propose a unified generative dialogue understanding framework, named UniDU, to achieve effective information exchange across diverse DU tasks. Here, we reformulate all DU tasks into a unified prompt-based generative model paradigm. More importantly, a novel model-agnostic multi-task training strategy (MATS) is introduced to dynamically adapt the weights of diverse tasks for best knowlege sharing during training, based on the nature and available data of each task. Experiments on ten DU datasets covering five fundamental DU tasks show that the proposed UniDU framework largely outperforms task-specific well-designed methods on all tasks. MATS also reveals the knowledge sharing structure of these tasks. Finally, UniDU obtains promising performance on unseen dialogue domain, showing great potential of generalization.

pdf
AdapterShare: Task Correlation Modeling with Adapter Differentiation
Zhi Chen | Bei Chen | Lu Chen | Kai Yu | Jian-Guang Lou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Thanks to the development of pre-trained language models, multitask learning (MTL) methods achieve a great success in natural language understanding area.However, current MTL methods pay more attention to task selection or model design to fuse as much knowledge as possible, while intrinsic task correlation is often neglected. It is important to learn sharing strategy among multiple tasks rather than sharing everything.%The MTL model is directly shared among all the tasks. %For example, in traditional MTL methods, the last classification layers or the decoder layers are manually separated. More deeply, In this paper, we propose AdapterShare, an adapter differentiation method to explicitly model the task correlation among multiple tasks. AdapterShare is automatically learned based on the gradients on tiny held-out validation data. Compared to single-task learning and fully shared MTL methods, our proposed method obtains obvious performance improvement. Compared to the existing MTL method AdapterFusion, AdapterShare achieves absolute 1.90 average points improvement on five dialogue understanding tasks and 2.33 points gain on NLU tasks.

2021

pdf
Affective Decoding for Empathetic Response Generation
Chengkun Zeng | Guanyi Chen | Chenghua Lin | Ruizhe Li | Zhi Chen
Proceedings of the 14th International Conference on Natural Language Generation

Understanding speaker’s feelings and producing appropriate responses with emotion connection is a key communicative skill for empathetic dialogue systems. In this paper, we propose a simple technique called Affective Decoding for empathetic response generation. Our method can effectively incorporate emotion signals during each decoding step, and can additionally be augmented with an auxiliary dual emotion encoder, which learns separate embeddings for the speaker and listener given the emotion base of the dialogue. Extensive empirical studies show that our models are perceived to be more empathetic by human evaluations, in comparison to several strong mainstream methods for empathetic responding.

pdf
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL
Zhi Chen | Lu Chen | Hanqi Li | Ruisheng Cao | Da Ma | Mengyue Wu | Kai Yu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
Ruisheng Cao | Lu Chen | Zhi Chen | Yanbin Zhao | Su Zhu | Kai Yu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This work aims to tackle the challenging heterogeneous graph encoding problem in the text-to-SQL task. Previous methods are typically node-centric and merely utilize different weight matrices to parameterize edge types, which 1) ignore the rich semantics embedded in the topological structure of edges, and 2) fail to distinguish local and non-local relations for each node. To this end, we propose a Line Graph Enhanced Text-to-SQL (LGESQL) model to mine the underlying relational features without constructing meta-paths. By virtue of the line graph, messages propagate more efficiently through not only connections between nodes, but also the topology of directed edges. Furthermore, both local and non-local relations are integrated distinctively during the graph iteration. We also design an auxiliary task called graph pruning to improve the discriminative capability of the encoder. Our framework achieves state-of-the-art results (62.8% with Glove, 72.0% with Electra) on the cross-domain text-to-SQL benchmark Spider at the time of writing.

pdf
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser
Zhi Chen | Lu Chen | Yanbin Zhao | Ruisheng Cao | Zihan Xu | Su Zhu | Kai Yu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Given a database schema, Text-to-SQL aims to translate a natural language question into the corresponding SQL query. Under the setup of cross-domain, traditional semantic parsing models struggle to adapt to unseen database schemas. To improve the model generalization capability for rare and unseen schemas, we propose a new architecture, ShadowGNN, which processes schemas at abstract and semantic levels. By ignoring names of semantic items in databases, abstract schemas are exploited in a well-designed graph projection neural network to obtain delexicalized representation of question and schema. Based on the domain-independent representations, a relation-aware transformer is utilized to further extract logical linking between question and schema. Finally, a SQL decoder with context-free grammar is applied. On the challenging Text-to-SQL benchmark Spider, empirical results show that ShadowGNN outperforms state-of-the-art models. When the annotated data is extremely limited (only 10% training set), ShadowGNN gets over absolute 5% performance gain, which shows its powerful generalization ability. Our implementation will be open-sourced at https://github.com/WowCZ/shadowgnn

2020

pdf
Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks
Yanbin Zhao | Lu Chen | Zhi Chen | Ruisheng Cao | Su Zhu | Kai Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Efficient structure encoding for graphs with labeled edges is an important yet challenging point in many graph-based models. This work focuses on AMR-to-text generation – A graph-to-sequence task aiming to recover natural language from Abstract Meaning Representations (AMR). Existing graph-to-sequence approaches generally utilize graph neural networks as their encoders, which have two limitations: 1) The message propagation process in AMR graphs is only guided by the first-order adjacency information. 2) The relationships between labeled edges are not fully considered. In this work, we propose a novel graph encoding framework which can effectively explore the edge relations. We also adopt graph attention networks with higher-order neighborhood information to encode the rich structure in AMR graphs. Experiment results show that our approach obtains new state-of-the-art performance on English AMR benchmark datasets. The ablation analyses also demonstrate that both edge relations and higher-order information are beneficial to graph-to-sequence modeling.

pdf
Neural Graph Matching Networks for Chinese Short Text Matching
Lu Chen | Yanbin Zhao | Boer Lyu | Lesheng Jin | Zhi Chen | Su Zhu | Kai Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Chinese short text matching usually employs word sequences rather than character sequences to get better performance. However, Chinese word segmentation can be erroneous, ambiguous or inconsistent, which consequently hurts the final matching performance. To address this problem, we propose neural graph matching networks, a novel sentence matching framework capable of dealing with multi-granular input information. Instead of a character sequence or a single word sequence, paired word lattices formed from multiple word segmentation hypotheses are used as input and the model learns a graph representation according to an attentive graph matching mechanism. Experiments on two Chinese datasets show that our models outperform the state-of-the-art short text matching models.