Chengyu Wang


2022

pdf
HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction
Dongyang Li | Taolin Zhang | Nan Hu | Chengyu Wang | Xiaofeng He
Findings of the Association for Computational Linguistics: ACL 2022

Distant supervision assumes that any sentence containing the same entity pairs reflects identical relationships. Previous works of distantly supervised relation extraction (DSRE) task generally focus on sentence-level or bag-level de-noising techniques independently, neglecting the explicit interaction with cross levels. In this paper, we propose a hierarchical contrastive learning Framework for Distantly Supervised relation extraction (HiCLRE) to reduce noisy sentences, which integrate the global structural information and local fine-grained interaction. Specifically, we propose a three-level hierarchical learning framework to interact with cross levels, generating the de-noising context-aware representations via adapting the existing multi-head self-attention, named Multi-Granularity Recontextualization. Meanwhile, pseudo positive samples are also provided in the specific level for contrastive learning via a dynamic gradient-based data augmentation strategy, named Dynamic Gradient Adversarial Perturbation. Experiments demonstrate that HiCLRE significantly outperforms strong baselines in various mainstream DSRE datasets.

pdf
Towards Unified Prompt Tuning for Few-shot Text Classification
Jianing Wang | Chengyu Wang | Fuli Luo | Chuanqi Tan | Minghui Qiu | Fei Yang | Qiuhui Shi | Songfang Huang | Ming Gao
Findings of the Association for Computational Linguistics: EMNLP 2022

Prompt-based fine-tuning has boosted the performance of Pre-trained Language Models (PLMs) on few-shot text classification by employing task-specific prompts. Yet, PLMs are unfamiliar with prompt-style expressions during pre-training, which limits the few-shot learning performance on downstream tasks.It would be desirable if the models can acquire some prompting knowledge before adapting to specific NLP tasks. We present the Unified Prompt Tuning (UPT) framework, leading to better few-shot text classification for BERT-style models by explicitly capturing prompting semantics from non-target NLP datasets. In UPT, a novel paradigm Prompt-Options-Verbalizer is proposed for joint prompt learning across different NLP tasks, forcing PLMs to capture task-invariant prompting knowledge. We further design a self-supervised task named Knowledge-enhanced Selective Masked Language Modeling to improve the PLM’s generalization abilities for accurate adaptation to previously unseen tasks. After multi-task learning across multiple tasks, the PLM can be better prompt-tuned towards any dissimilar target tasks in low-resourced settings. Experiments over a variety of NLP tasks show that UPT consistently outperforms state-of-the-arts for prompt-based fine-tuning.

pdf
ARTIST: A Transformer-based Chinese Text-to-Image Synthesizer Digesting Linguistic and World Knowledge
Tingting Liu | Chengyu Wang | Xiangru Zhu | Lei Li | Minghui Qiu | Jun Huang | Ming Gao | Yanghua Xiao
Findings of the Association for Computational Linguistics: EMNLP 2022

Text-to-Image Synthesis (TIS) is a popular task to convert natural language texts into realistic images. Recently, transformer-based TIS models (such as DALL-E) have been proposed using the encoder-decoder architectures. Yet, these billion-scale TIS models are difficult to tune and deploy in resource-constrained environments. In addition, there is a lack of language-specific TIS benchmarks for Chinese, together with high-performing models with moderate sizes. In this work, we present ARTIST, A tRansformer-based Chinese Text-to-Image SynThesizer for high-resolution image generation. In ARTIST, the rich linguistic and relational knowledge facts are injected into the model to ensure better model performance without the usage of ultra-large models. We further establish a large-scale Chinese TIS benchmark with the re-production results of state-of-the-art transformer-based TIS models.Results show ARTIST outperforms previous approaches.

pdf
KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering
Jianing Wang | Chengyu Wang | Minghui Qiu | Qiuhui Shi | Hongbin Wang | Jun Huang | Ming Gao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Extractive Question Answering (EQA) is one of the most essential tasks in Machine Reading Comprehension (MRC), which can be solved by fine-tuning the span selecting heads of Pre-trained Language Models (PLMs). However, most existing approaches for MRC may perform poorly in the few-shot learning scenario. To solve this issue, we propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP). Instead of adding pointer heads to PLMs, we introduce a seminal paradigm for EQA that transforms the task into a non-autoregressive Masked Language Modeling (MLM) generation problem. Simultaneously, rich semantics from the external knowledge base (KB) and the passage context support enhancing the query’s representations. In addition, to boost the performance of PLMs, we jointly train the model by the MLM and contrastive learning objectives. Experiments on multiple benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.

pdf
SpanProto: A Two-stage Span-based Prototypical Network for Few-shot Named Entity Recognition
Jianing Wang | Chengyu Wang | Chuanqi Tan | Minghui Qiu | Songfang Huang | Jun Huang | Ming Gao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Few-shot Named Entity Recognition (NER) aims to identify named entities with very little annotated data. Previous methods solve this problem based on token-wise classification, which ignores the information of entity boundaries, and inevitably the performance is affected by the massive non-entity tokens. To this end, we propose a seminal span-based prototypical network (SpanProto) that tackles few-shot NER via a two-stage approach, including span extraction and mention classification. In the span extraction stage, we transform the sequential tags into a global boundary matrix, enabling the model to focus on the explicit boundary information. For mention classification, we leverage prototypical learning to capture the semantic representations for each labeled span and make the model better adapt to novel-class entities. To further improve the model performance, we split out the false positives generated by the span extractor but not labeled in the current episode set, and then present a margin-based loss to separate them from each prototype region. Experiments over multiple benchmarks demonstrate that our model outperforms strong baselines by a large margin.

2021

pdf
Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources
Taolin Zhang | Chengyu Wang | Minghui Qiu | Bite Yang | Zerui Cai | Xiaofeng He | Jun Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
UnClE: Explicitly Leveraging Semantic Similarity to Reduce the Parameters of Word Embeddings
Zhi Li | Yuchen Zhai | Chengyu Wang | Minghui Qiu | Kailiang Li | Yin Zhang
Findings of the Association for Computational Linguistics: EMNLP 2021

Natural language processing (NLP) models often require a massive number of parameters for word embeddings, which limits their application on mobile devices. Researchers have employed many approaches, e.g. adaptive inputs, to reduce the parameters of word embeddings. However, existing methods rarely pay attention to semantic information. In this paper, we propose a novel method called Unique and Class Embeddings (UnClE), which explicitly leverages semantic similarity with weight sharing to reduce the dimensionality of word embeddings. Inspired by the fact that words with similar semantic can share a part of weights, we divide the embeddings of words into two parts: unique embedding and class embedding. The former is one-to-one mapping like traditional embedding, while the latter is many-to-one mapping and learn the representation of class information. Our method is suitable for both word-level and sub-word level models and can be used to reduce both input and output embeddings. Experimental results on the standard WMT 2014 English-German dataset show that our method is able to reduce the parameters of word embeddings by more than 11x, with about 93% performance retaining in BLEU metrics. For language modeling task, our model can reduce word embeddings by 6x or 11x on PTB/WT2 dataset at the cost of a certain degree of performance degradation.

pdf
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains
Haojie Pan | Chengyu Wang | Minghui Qiu | Yichang Zhang | Yaliang Li | Jun Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Pre-trained language models have been applied to various NLP tasks with considerable performance gains. However, the large model sizes, together with the long inference time, limit the deployment of such models in real-time applications. One line of model compression approaches considers knowledge distillation to distill large teacher models into small student models. Most of these studies focus on single-domain only, which ignores the transferable knowledge from other domains. We notice that training a teacher with transferable knowledge digested across domains can achieve better generalization capability to help knowledge distillation. Hence we propose a Meta-Knowledge Distillation (Meta-KD) framework to build a meta-teacher model that captures transferable knowledge across domains and passes such knowledge to students. Specifically, we explicitly force the meta-teacher to capture transferable knowledge at both instance-level and feature-level from multiple domains, and then propose a meta-distillation algorithm to learn single-domain student models with guidance from the meta-teacher. Experiments on public multi-domain NLP tasks show the effectiveness and superiority of the proposed Meta-KD framework. Further, we also demonstrate the capability of Meta-KD in the settings where the training data is scarce.

pdf
SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining
Taolin Zhang | Zerui Cai | Chengyu Wang | Minghui Qiu | Bite Yang | Xiaofeng He
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, the performance of Pre-trained Language Models (PLMs) has been significantly improved by injecting knowledge facts to enhance their abilities of language understanding. For medical domains, the background knowledge sources are especially useful, due to the massive medical terms and their complicated relations are difficult to understand in text. In this work, we introduce SMedBERT, a medical PLM trained on large-scale medical corpora, incorporating deep structured semantic knowledge from neighbours of linked-entity. In SMedBERT, the mention-neighbour hybrid attention is proposed to learn heterogeneous-entity information, which infuses the semantic representations of entity types into the homogeneous neighbouring entity structure. Apart from knowledge integration as external features, we propose to employ the neighbors of linked-entities in the knowledge graph as additional global contexts of text mentions, allowing them to communicate via shared neighbors, thus enrich their semantic representations. Experiments demonstrate that SMedBERT significantly outperforms strong baselines in various knowledge-intensive Chinese medical tasks. It also improves the performance of other tasks such as question answering, question matching and natural language inference.

pdf
TransPrompt: Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification
Chengyu Wang | Jianing Wang | Minghui Qiu | Jun Huang | Ming Gao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent studies have shown that prompts improve the performance of large pre-trained language models for few-shot text classification. Yet, it is unclear how the prompting knowledge can be transferred across similar NLP tasks for the purpose of mutual reinforcement. Based on continuous prompt embeddings, we propose TransPrompt, a transferable prompting framework for few-shot learning across similar tasks. In TransPrompt, we employ a multi-task meta-knowledge acquisition procedure to train a meta-learner that captures cross-task transferable knowledge. Two de-biasing techniques are further designed to make it more task-agnostic and unbiased towards any tasks. After that, the meta-learner can be adapted to target tasks with high accuracy. Extensive experiments show that TransPrompt outperforms single-task and cross-task strong baselines over multiple NLP tasks and datasets. We further show that the meta-learner can effectively improve the performance on previously unseen tasks; and TransPrompt also outperforms strong fine-tuning baselines when learning with full training sets.

pdf
Meta Distant Transfer Learning for Pre-trained Language Models
Chengyu Wang | Haojie Pan | Minghui Qiu | Jun Huang | Fei Yang | Yin Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

With the wide availability of Pre-trained Language Models (PLMs), multi-task fine-tuning across domains has been extensively applied. For tasks related to distant domains with different class label sets, PLMs may memorize non-transferable knowledge for the target domain and suffer from negative transfer. Inspired by meta-learning, we propose the Meta Distant Transfer Learning (Meta-DTL) framework to learn the cross-task knowledge for PLM-based methods. Meta-DTL first employs task representation learning to mine implicit relations among multiple tasks and classes. Based on the results, it trains a PLM-based meta-learner to capture the transferable knowledge across tasks. The weighted maximum entropy regularizers are proposed to make meta-learner more task-agnostic and unbiased. Finally, the meta-learner can be fine-tuned to fit each task with better parameter initialization. We evaluate Meta-DTL using both BERT and ALBERT on seven public datasets. Experiment results confirm the superiority of Meta-DTL as it consistently outperforms strong baselines. We find that Meta-DTL is highly effective when very few data is available for the target task.

2020

pdf
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Chengyu Wang | Minghui Qiu | Jun Huang | Xiaofeng He
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Pre-trained neural language models bring significant improvement for various NLP tasks, by fine-tuning the models on task-specific training sets. During fine-tuning, the parameters are initialized from pre-trained models directly, which ignores how the learning process of similar NLP tasks in different domains is correlated and mutually reinforced. In this paper, we propose an effective learning procedure named Meta Fine-Tuning (MFT), serving as a meta-learner to solve a group of similar NLP tasks for neural language models. Instead of simply multi-task training over all the datasets, MFT only learns from typical instances of various domains to acquire highly transferable knowledge. It further encourages the language model to encode domain-invariant representations by optimizing a series of novel domain corruption loss functions. After MFT, the model can be fine-tuned for each domain with better parameter initializations and higher generalization ability. We implement MFT upon BERT to solve several multi-domain text mining tasks. Experimental results confirm the effectiveness of MFT and its usefulness for few-shot learning.

pdf
BiRRE: Learning Bidirectional Residual Relation Embeddings for Supervised Hypernymy Detection
Chengyu Wang | Xiaofeng He
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The hypernymy detection task has been addressed under various frameworks. Previously, the design of unsupervised hypernymy scores has been extensively studied. In contrast, supervised classifiers, especially distributional models, leverage the global contexts of terms to make predictions, but are more likely to suffer from “lexical memorization”. In this work, we revisit supervised distributional models for hypernymy detection. Rather than taking embeddings of two terms as classification inputs, we introduce a representation learning framework named Bidirectional Residual Relation Embeddings (BiRRE). In this model, a term pair is represented by a BiRRE vector as features for hypernymy classification, which models the possibility of a term being mapped to another in the embedding space by hypernymy relations. A Latent Projection Model with Negative Regularization (LPMNR) is proposed to simulate how hypernyms and hyponyms are generated by neural language models, and to generate BiRRE vectors based on bidirectional residuals of projections. Experiments verify BiRRE outperforms strong baselines over various evaluation frameworks.

2019

pdf
SphereRE: Distinguishing Lexical Relations with Hyperspherical Relation Embeddings
Chengyu Wang | Xiaofeng He | Aoying Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Lexical relations describe how meanings of terms relate to each other. Typical examples include hypernymy, synonymy, meronymy, etc. Automatic distinction of lexical relations is vital for NLP applications, and also challenging due to the lack of contextual signals to discriminate between such relations. In this work, we present a neural representation learning model to distinguish lexical relations among term pairs based on Hyperspherical Relation Embeddings (SphereRE). Rather than learning embeddings for individual terms, the model learns representations of relation triples by mapping them to the hyperspherical embedding space, where relation triples of different lexical relations are well separated. Experiments over several benchmarks confirm SphereRE outperforms state-of-the-arts.

2018

pdf
Exploratory Neural Relation Classification for Domain Knowledge Acquisition
Yan Fan | Chengyu Wang | Xiaofeng He
Proceedings of the 27th International Conference on Computational Linguistics

The state-of-the-art methods for relation classification are primarily based on deep neural net- works. This kind of supervised learning method suffers from not only limited training data, but also the large number of low-frequency relations in specific domains. In this paper, we propose the task of exploratory relation classification for domain knowledge harvesting. The goal is to learn a classifier on pre-defined relations and discover new relations expressed in texts. A dynamically structured neural network is introduced to classify entity pairs to a continuously expanded relation set. We further propose the similarity sensitive Chinese restaurant process to discover new relations. Experiments conducted on a large corpus show the effectiveness of our neural network, while new relations are discovered with high precision and recall.

2017

pdf
A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances
Chengyu Wang | Xiaofeng He | Aoying Zhou
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

A taxonomy is a semantic hierarchy, consisting of concepts linked by is-a relations. While a large number of taxonomies have been constructed from human-compiled resources (e.g., Wikipedia), learning taxonomies from text corpora has received a growing interest and is essential for long-tailed and domain-specific knowledge acquisition. In this paper, we overview recent advances on taxonomy construction from free texts, reorganizing relevant subtasks into a complete framework. We also overview resources for evaluation and discuss challenges for future research.

pdf
Learning Fine-grained Relations from Chinese User Generated Categories
Chengyu Wang | Yan Fan | Xiaofeng He | Aoying Zhou
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

User generated categories (UGCs) are short texts that reflect how people describe and organize entities, expressing rich semantic relations implicitly. While most methods on UGC relation extraction are based on pattern matching in English circumstances, learning relations from Chinese UGCs poses different challenges due to the flexibility of expressions. In this paper, we present a weakly supervised learning framework to harvest relations from Chinese UGCs. We identify is-a relations via word embedding based projection and inference, extract non-taxonomic relations and their category patterns by graph mining. We conduct experiments on Chinese Wikipedia and achieve high accuracy, outperforming state-of-the-art methods.

pdf
Transductive Non-linear Learning for Chinese Hypernym Prediction
Chengyu Wang | Junchi Yan | Aoying Zhou | Xiaofeng He
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Finding the correct hypernyms for entities is essential for taxonomy learning, fine-grained entity categorization, query understanding, etc. Due to the flexibility of the Chinese language, it is challenging to identify hypernyms in Chinese accurately. Rather than extracting hypernyms from texts, in this paper, we present a transductive learning approach to establish mappings from entities to hypernyms in the embedding space directly. It combines linear and non-linear embedding projection models, with the capacity of encoding arbitrary language-specific rules. Experiments on real-world datasets illustrate that our approach outperforms previous methods for Chinese hypernym prediction.

2016

pdf
Chinese Hypernym-Hyponym Extraction from User Generated Categories
Chengyu Wang | Xiaofeng He
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Hypernym-hyponym (“is-a”) relations are key components in taxonomies, object hierarchies and knowledge graphs. While there is abundant research on is-a relation extraction in English, it still remains a challenge to identify such relations from Chinese knowledge sources accurately due to the flexibility of language expression. In this paper, we introduce a weakly supervised framework to extract Chinese is-a relations from user generated categories. It employs piecewise linear projection models trained on a Chinese taxonomy and an iterative learning algorithm to update models incrementally. A pattern-based relation selection method is proposed to prevent “semantic drift” in the learning process using bi-criteria optimization. Experimental results illustrate that the proposed approach outperforms state-of-the-art methods.