Yang Li


2023

pdf
History Semantic Graph Enhanced Conversational KBQA with Temporal Information Modeling
Hao Sun | Yang Li | Liwei Deng | Bowen Li | Binyuan Hui | Binhua Li | Yunshi Lan | Yan Zhang | Yongbin Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Context information modeling is an important task in conversational KBQA. However, existing methods usually assume the independence of utterances and model them in isolation. In this paper, we propose a History Semantic Graph Enhanced KBQA model (HSGE) that is able to effectively model long-range semantic dependencies in conversation history while maintaining low computational cost. The framework incorporates a context-aware encoder, which employs a dynamic memory decay mechanism and models context at different levels of granularity. We evaluate HSGE on a widely used benchmark dataset for complex sequential question answering. Experimental results demonstrate that it outperforms existing baselines averaged on all question types.

pdf
Multitask Pretraining with Structured Knowledge for Text-to-SQL Generation
Robert Giaquinto | Dejiao Zhang | Benjamin Kleiner | Yang Li | Ming Tan | Parminder Bhatia | Ramesh Nallapati | Xiaofei Ma
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Many machine learning-based low-code or no-code applications involve generating code that interacts with structured knowledge. For example, one of the most studied tasks in this area is generating SQL code from a natural language statement. Prior work shows that incorporating context information from the database schema, such as table and column names, is beneficial to model performance on this task. In this work we present a large pretraining dataset and strategy for learning representations of text, tables, and SQL code that leverages the entire context of the problem. Specifically, we build on existing encoder-decoder architecture by introducing a multitask pretraining framework that complements the unique attributes of our diverse pretraining data. Our work represents the first study on large-scale pretraining of encoder-decoder models for interacting with structured knowledge, and offers a new state-of-the-art foundation model in text-to-SQL generation. We validate our approach with experiments on two SQL tasks, showing improvement over existing methods, including a 1.7 and 2.2 percentage point improvement over prior state-of-the-arts on Spider and CoSQL.

pdf
A Cross-Modality Context Fusion and Semantic Refinement Network for Emotion Recognition in Conversation
Xiaoheng Zhang | Yang Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Emotion recognition in conversation (ERC) has attracted enormous attention for its applications in empathetic dialogue systems. However, most previous researches simply concatenate multimodal representations, leading to an accumulation of redundant information and a limited context interaction between modalities. Furthermore, they only consider simple contextual features ignoring semantic clues, resulting in an insufficient capture of the semantic coherence and consistency in conversations. To address these limitations, we propose a cross-modality context fusion and semantic refinement network (CMCF-SRNet). Specifically, we first design a cross-modal locality-constrained transformer to explore the multimodal interaction. Second, we investigate a graph-based semantic refinement transformer, which solves the limitation of insufficient semantic relationship information between utterances. Extensive experiments on two public benchmark datasets show the effectiveness of our proposed method compared with other state-of-the-art methods, indicating its potential application in emotion recognition. Our model will be available at https://github.com/zxiaohen/CMCF-SRNet.

pdf
Enhanced Training Methods for Multiple Languages
Hai Li | Yang Li
Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

Document-grounded dialogue generation based on multilingual is a challenging and realistic task. Unlike previous tasks, it need to tackle with multiple high-resource languages facilitating low-resource languages. This paper summarizes our research based on a three-stage pipeline that includes retrieval, re-rank and generation where each component is individually optimized. In different languages with limited data scenarios, we mainly improve the robustness of the pipeline through data augmentation and embedding perturbation with purpose of improving the performance designing three training methods: cross-language enhancement training, weighted training with neighborhood distribution augmentation, and ensemble adversarial training, all of that can be used as plug and play modules. Through experiments with different settings, it has been shown that our methods can effectively improve the generalization performance of pipeline with score ranking 6th among the public submissions on leaderboards.

pdf
CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction
Xinnan Guo | Wentao Deng | Yongrui Chen | Yang Li | Mengdi Zhou | Guilin Qi | Tianxing Wu | Dong Yang | Liubin Wang | Yong Pan
Findings of the Association for Computational Linguistics: ACL 2023

Attribute Value Extraction (AVE) aims to automatically obtain attribute value pairs from product descriptions to aid e-commerce. Despite the progressive performance of existing approaches in e-commerce platforms, they still suffer from two challenges: 1) difficulty in identifying values at different scales simultaneously; 2) easy confusion by some highly similar fine-grained attributes. This paper proposes a pre-training technique for AVE to address these issues. In particular, we first improve the conventional token-level masking strategy, guiding the language model to understand multi-scale values by recovering spans at the phrase and sentence level. Second, we apply clustering to build a challenging negative set for each example and design a pre-training objective based on contrastive learning to force the model to discriminate similar attributes. Comprehensive experiments show that our solution provides a significant improvement over traditional pre-trained models in the AVE task, and achieves state-of-the-art on four benchmarks.

pdf
Structure-Discourse Hierarchical Graph for Conditional Question Answering on Long Documents
Haowei Du | Yansong Feng | Chen Li | Yang Li | Yunshi Lan | Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Conditional question answering on long documents aims to find probable answers and identify conditions that need to be satisfied to make the answers correct over long documents. Existing approaches solve this task by segmenting long documents into multiple sections, and attending information at global and local tokens to predict the answers and corresponding conditions. However, the natural structure of the document and discourse relations between sentences in each document section are ignored, which are crucial for condition retrieving across sections, as well as logical interaction over the question and conditions. To address this issue, this paper constructs a Structure-Discourse Hierarchical Graph (SDHG) and conducts bottom-up information propagation. Firstly we build the sentence-level discourse graphs for each section and encode the discourse relations by graph attention. Secondly, we construct a section-level structure graph based on natural structures, and conduct interactions over the question and contexts.Finally different levels of representations are integrated into jointly answer and condition decoding. The experiments on the benchmark ConditionalQA shows our approach gains over the prior state-of-the-art, by 3.0 EM score and 2.4 F1 score on answer measuring, as well as 2.2 EM score and 1.9 F1 score on jointly answer and condition measuring.

pdf
Enhancing Event Causality Identification with Event Causal Label and Event Pair Interaction Graph
Ruili Pu | Yang Li | Suge Wang | Deyu Li | Jianxing Zheng | Jian Liao
Findings of the Association for Computational Linguistics: ACL 2023

Most existing event causality identification (ECI) methods rarely consider the event causal label information and the interaction information between event pairs. In this paper, we propose a framework to enrich the representation of event pairs by introducing the event causal label information and the event pair interaction information. In particular, 1) we design an event-causal-label-aware module to model the event causal label information, in which we design the event causal label prediction task as an auxiliary task of ECI, aiming to predict which events are involved in the causal relationship (we call them causality-related events) by mining the dependencies between events. 2) We further design an event pair interaction graph module to model the interaction information between event pairs, in which we construct the interaction graph with event pairs as nodes and leverage graph attention mechanism to model the degree of dependency between event pairs. The experimental results show that our approach outperforms previous state-of-the-art methods on two benchmark datasets EventStoryLine and Causal-TimeBank.

2022

pdf
Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging
Houquan Zhou | Yang Li | Zhenghua Li | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2022

In recent years, large-scale pre-trained language models (PLMs) have made extraordinary progress in most NLP tasks. But, in the unsupervised POS tagging task, works utilizing PLMs are few and fail to achieve state-of-the-art (SOTA) performance. The recent SOTA performance is yielded by a Guassian HMM variant proposed by He et al. (2018). However, as a generative model, HMM makes very strong independence assumptions, making it very challenging to incorporate contexualized word representations from PLMs. In this work, we for the first time propose a neural conditional random field autoencoder (CRF-AE) model for unsupervised POS tagging. The discriminative encoder of CRF-AE can straightforwardly incorporate ELMo word representations. Moreover, inspired by feature-rich HMM, we reintroduce hand-crafted features into the decoder of CRF-AE. Finally, experiments clearly show that our model outperforms previous state-of-the-art models by a large margin on Penn Treebank and multilingual Universal Dependencies treebank v2.0.

pdf
Hierarchical Relation-Guided Type-Sentence Alignment for Long-Tail Relation Extraction with Distant Supervision
Yang Li | Guodong Long | Tao Shen | Jing Jiang
Findings of the Association for Computational Linguistics: NAACL 2022

Distant supervision uses triple facts in knowledge graphs to label a corpus for relation extraction, leading to wrong labeling and long-tail problems. Some works use the hierarchy of relations for knowledge transfer to long-tail relations. However, a coarse-grained relation often implies only an attribute (e.g., domain or topic) of the distant fact, making it hard to discriminate relations based solely on sentence semantics. One solution is resorting to entity types, but open questions remain about how to fully leverage the information of entity types and how to align multi-granular entity types with sentences. In this work, we propose a novel model to enrich distantly-supervised sentences with entity types. It consists of (1) a pairwise type-enriched sentence encoding module injecting both context-free and -related backgrounds to alleviate sentence-level wrong labeling, and (2) a hierarchical type-sentence alignment module enriching a sentence with the triple fact’s basic attributes to support long-tail relations. Our model achieves new state-of-the-art results in overall and long-tail performance on benchmarks.

pdf
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Yang Li | Cheng Yu | Guangzhi Sun | Hua Jiang | Fanglei Sun | Weiqin Zu | Ying Wen | Yang Yang | Jun Wang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins.

pdf
GammaE: Gamma Embeddings for Logical Queries on Knowledge Graphs
Dong Yang | Peijun Qing | Yang Li | Haonan Lu | Xiaodong Lin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Embedding knowledge graphs (KGs) for multi-hop logical reasoning is a challenging problem due to massive and complicated structures in many KGs. Recently, many promising works projected entities and queries into a geometric space to efficiently find answers. However, it remains challenging to model the negation and union operator. The negation operator has no strict boundaries, which generates overlapped embeddings and leads to obtaining ambiguous answers. An additional limitation is that the union operator is non-closure, which undermines the model to handle a series of union operators. To address these problems, we propose a novel probabilistic embedding model, namely Gamma Embeddings (GammaE), for encoding entities and queries to answer different types of FOL queries on KGs. We utilize the linear property and strong boundary support of the Gamma distribution to capture more features of entities and queries, which dramatically reduces model uncertainty. Furthermore, GammaE implements the Gamma mixture method to design the closed union operator. The performance of GammaE is validated on three large logical query datasets. Experimental results show that GammaE significantly outperforms state-of-the-art models on public benchmarks.

pdf
Generative Data Augmentation with Contrastive Learning for Zero-Shot Stance Detection
Yang Li | Jiawei Yuan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Stance detection aims to identify whether the author of an opinionated text is in favor of, against, or neutral towards a given target. Remarkable success has been achieved when sufficient labeled training data is available. However, it is labor-intensive to annotate sufficient data and train the model for every new target.Therefore, zero-shot stance detection, aiming at identifying stances of unseen targets with seen targets, has gradually attracted attention. Among them, one of the important challenges is to reduce the domain transfer between seen and unseen targets. To tackle this problem, we propose a generative data augmentation approach to generate training samples containing targets and stances for testing data, and map the real samples and generated synthetic samples into the same embedding space with contrastive learning, then perform the final classification based on the augmented data. We evaluate our proposed model on two benchmark datasets. Experimental results show that our approach achieves state-of-the-art performance on most topics in the task of zero-shot stance detection.

2021

pdf
Emotion Inference in Multi-Turn Conversations with Addressee-Aware Module and Ensemble Strategy
Dayu Li | Xiaodan Zhu | Yang Li | Suge Wang | Deyu Li | Jian Liao | Jianxing Zheng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Emotion inference in multi-turn conversations aims to predict the participant’s emotion in the next upcoming turn without knowing the participant’s response yet, and is a necessary step for applications such as dialogue planning. However, it is a severe challenge to perceive and reason about the future feelings of participants, due to the lack of utterance information from the future. Moreover, it is crucial for emotion inference to capture the characteristics of emotional propagation in conversations, such as persistence and contagiousness. In this study, we focus on investigating the task of emotion inference in multi-turn conversations by modeling the propagation of emotional states among participants in the conversation history, and propose an addressee-aware module to automatically learn whether the participant keeps the historical emotional state or is affected by others in the next upcoming turn. In addition, we propose an ensemble strategy to further enhance the model performance. Empirical studies on three different benchmark conversation datasets demonstrate the effectiveness of the proposed model over several strong baselines.

2020

pdf
Improving Long-Tail Relation Extraction with Collaborating Relation-Augmented Attention
Yang Li | Tao Shen | Guodong Long | Jing Jiang | Tianyi Zhou | Chengqi Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Wrong labeling problem and long-tail relations are two main challenges caused by distant supervision in relation extraction. Recent works alleviate the wrong labeling by selective attention via multi-instance learning, but cannot well handle long-tail relations even if hierarchies of the relations are introduced to share knowledge. In this work, we propose a novel neural network, Collaborating Relation-augmented Attention (CoRA), to handle both the wrong labeling and long-tail relations. Particularly, we first propose relation-augmented attention network as base model. It operates on sentence bag with a sentence-to-relation attention to minimize the effect of wrong labeling. Then, facilitated by the proposed base model, we introduce collaborating relation features shared among relations in the hierarchies to promote the relation-augmenting process and balance the training data for long-tail relations. Besides the main training objective to predict the relation of a sentence bag, an auxiliary objective is utilized to guide the relation-augmenting process for a more accurate bag-level representation. In the experiments on the popular benchmark dataset NYT, the proposed CoRA improves the prior state-of-the-art performance by a large margin in terms of Precision@N, AUC and Hits@K. Further analyses verify its superior capability in handling long-tail relations in contrast to the competitors.

pdf
Mapping Natural Language Instructions to Mobile UI Action Sequences
Yang Li | Jiacong He | Xin Zhou | Yuan Zhang | Jason Baldridge
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PixelHelp, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in How-To instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PixelHelp.

pdf
Public Sentiment Drift Analysis Based on Hierarchical Variational Auto-encoder
Wenyue Zhang | Xiaoli Li | Yang Li | Suge Wang | Deyu Li | Jian Liao | Jianxing Zheng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Detecting public sentiment drift is a challenging task due to sentiment change over time. Existing methods first build a classification model using historical data and subsequently detect drift if the model performs much worse on new data. In this paper, we focus on distribution learning by proposing a novel Hierarchical Variational Auto-Encoder (HVAE) model to learn better distribution representation, and design a new drift measure to directly evaluate distribution changes between historical data and new data.Our experimental results demonstrate that our proposed model achieves better results than three existing state-of-the-art methods.

pdf
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements
Yang Li | Gang Li | Luheng He | Jingjie Zheng | Hong Li | Zhiwei Guan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Natural language descriptions of user interface (UI) elements such as alternative text are crucial for accessibility and language-based interaction in general. Yet, these descriptions are constantly missing in mobile UIs. We propose widget captioning, a novel task for automatically generating language descriptions for UI elements from multimodal input including both the image and the structural representations of user interfaces. We collected a large-scale dataset for widget captioning with crowdsourcing. Our dataset contains 162,860 language phrases created by human workers for annotating 61,285 UI elements across 21,750 unique UI screens. We thoroughly analyze the dataset, and train and evaluate a set of deep model configurations to investigate how each feature modality as well as the choice of learning strategies impact the quality of predicted captions. The task formulation and the dataset as well as our benchmark models contribute a solid basis for this novel multimodal captioning task that connects language and user interfaces.

pdf
Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization
Rahul Jha | Keping Bi | Yang Li | Mahdi Pakdaman | Asli Celikyilmaz | Ivan Zhiboedov | Kieran McDonald
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

We describe Artemis (Annotation methodology for Rich, Tractable, Extractive, Multi-domain, Indicative Summarization), a novel hierarchical annotation process that produces indicative summaries for documents from multiple domains. Current summarization evaluation datasets are single-domain and focused on a few domains for which naturally occurring summaries can be easily found, such as news and scientific articles. These are not sufficient for training and evaluation of summarization models for use in document management and information retrieval systems, which need to deal with documents from multiple domains. Compared to other annotation methods such as Relative Utility and Pyramid, Artemis is more tractable because judges don’t need to look at all the sentences in a document when making an importance judgment for one of the sentences, while providing similarly rich sentence importance annotations. We describe the annotation process in detail and compare it with other similar evaluation systems. We also present analysis and experimental results over a sample set of 532 annotated documents.

2019

pdf
Event Detection without Triggers
Shulin Liu | Yang Li | Feng Zhang | Tao Yang | Xinpeng Zhou
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The goal of event detection (ED) is to detect the occurrences of events and categorize them. Previous work solved this task by recognizing and classifying event triggers, which is defined as the word or phrase that most clearly expresses an event occurrence. As a consequence, existing approaches required both annotated triggers and event types in training data. However, triggers are nonessential to event detection, and it is time-consuming for annotators to pick out the “most clearly” word from a given sentence, especially from a long sentence. The expensive annotation of training corpus limits the application of existing approaches. To reduce manual effort, we explore detecting events without triggers. In this work, we propose a novel framework dubbed as Type-aware Bias Neural Network with Attention Mechanisms (TBNNAM), which encodes the representation of a sentence based on target event types. Experimental results demonstrate the effectiveness. Remarkably, the proposed approach even achieves competitive performances compared with state-of-the-arts that used annotated triggers.

2018

pdf
Guess Me if You Can: Acronym Disambiguation for Enterprises
Yang Li | Bo Zhao | Ariel Fuxman | Fangbo Tao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Acronyms are abbreviations formed from the initial components of words or phrases. In enterprises, people often use acronyms to make communications more efficient. However, acronyms could be difficult to understand for people who are not familiar with the subject matter (new employees, etc.), thereby affecting productivity. To alleviate such troubles, we study how to automatically resolve the true meanings of acronyms in a given context. Acronym disambiguation for enterprises is challenging for several reasons. First, acronyms may be highly ambiguous since an acronym used in the enterprise could have multiple internal and external meanings. Second, there are usually no comprehensive knowledge bases such as Wikipedia available in enterprises. Finally, the system should be generic to work for any enterprise. In this work we propose an end-to-end framework to tackle all these challenges. The framework takes the enterprise corpus as input and produces a high-quality acronym disambiguation system as output. Our disambiguation models are trained via distant supervised learning, without requiring any manually labeled training examples. Therefore, our proposed framework can be deployed to any enterprise to support high-quality acronym disambiguation. Experimental results on real world data justified the effectiveness of our system.

2016

pdf
Hashtag Recommendation with Topical Attention-Based LSTM
Yang Li | Ting Liu | Jing Jiang | Liang Zhang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Microblogging services allow users to create hashtags to categorize their posts. In recent years, the task of recommending hashtags for microblogs has been given increasing attention. However, most of existing methods depend on hand-crafted features. Motivated by the successful use of long short-term memory (LSTM) for many natural language processing tasks, in this paper, we adopt LSTM to learn the representation of a microblog post. Observing that hashtags indicate the primary topics of microblog posts, we propose a novel attention-based LSTM model which incorporates topic modeling into the LSTM architecture through an attention mechanism. We evaluate our model using a large real-world dataset. Experimental results show that our model significantly outperforms various competitive baseline methods. Furthermore, the incorporation of topical attention mechanism gives more than 7.4% improvement in F1 score compared with standard LSTM method.

2015

pdf
Answering Elementary Science Questions by Constructing Coherent Scenes using Background Knowledge
Yang Li | Peter Clark
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2009

pdf
The ICT statistical machine translation system for the IWSLT 2009
Haitao Mi | Yang Li | Tian Xia | Xinyan Xiao | Yang Feng | Jun Xie | Hao Xiong | Zhaopeng Tu | Daqi Zheng | Yanjuan Lu | Qun Liu
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the ICT Statistical Machine Translation systems that used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2009. For this year’s evaluation, we participated in the Challenge Task (Chinese-English and English-Chinese) and BTEC Task (Chinese-English). And we mainly focus on one new method to improve single system’s translation quality. Specifically, we developed a sentence-similarity based development set selection technique. For each task, we finally submitted the single system who got the maximum BLEU scores on the selected development set. The four single translation systems are based on different techniques: a linguistically syntax-based system, two formally syntax-based systems and a phrase-based system. Typically, we didn’t use any rescoring or system combination techniques in this year’s evaluation.