Wenliang Chen


2022

pdf
SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training
Dan Qiao | Chenchen Dai | Yuyang Ding | Juntao Li | Qiang Chen | Wenliang Chen | Min Zhang
Proceedings of the 29th International Conference on Computational Linguistics

The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks. However, in real-world applications, label noise inevitably exists in training data, damaging the effectiveness, robustness, and generalization of the models constructed on such data. Recently, remarkable achievements have been made to mitigate this dilemma in visual data, while only a few explore textual data. To fill this gap, we present SelfMix, a simple yet effective method, to handle label noise in text classification tasks. SelfMix uses the Gaussian Mixture Model to separate samples and leverages semi-supervised learning. Unlike previous works requiring multiple models, our method utilizes the dropout mechanism on a single model to reduce the confirmation bias in self-training and introduces a textual level mixup training strategy. Experimental results on three text classification benchmarks with different types of text show that the performance of our proposed method outperforms these strong baselines designed for both textual and visual data under different noise ratios and noise types. Our anonymous code is available at https://github.com/noise-learning/SelfMix.

pdf
STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction
Junjie Yu | Xing Wang | Jiangjiang Zhao | Chunjie Yang | Wenliang Chen
Proceedings of the 29th International Conference on Computational Linguistics

We present a simple yet effective self-training approach, named as STAD, for low-resource relation extraction. The approach first classifies the auto-annotated instances into two groups: confident instances and uncertain instances, according to the probabilities predicted by a teacher model. In contrast to most previous studies, which mainly only use the confident instances for self-training, we make use of the uncertain instances. To this end, we propose a method to identify ambiguous but useful instances from the uncertain instances and then divide the relations into candidate-label set and negative-label set for each ambiguous instance. Next, we propose a set-negative training method on the negative-label sets for the ambiguous instances and a positive training method for the confident instances. Finally, a joint-training method is proposed to build the final relation extraction system on all data. Experimental results on two widely used datasets SemEval2010 Task-8 and Re-TACRED with low-resource settings demonstrate that this new self-training approach indeed achieves significant and consistent improvements when comparing to several competitive self-training systems.

2020

pdf
Improving Relation Extraction with Relational Paraphrase Sentences
Junjie Yu | Tong Zhu | Wenliang Chen | Wei Zhang | Min Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Supervised models for Relation Extraction (RE) typically require human-annotated training data. Due to the limited size, the human-annotated data is usually incapable of covering diverse relation expressions, which could limit the performance of RE. To increase the coverage of relation expressions, we may enlarge the labeled data by hiring annotators or applying Distant Supervision (DS). However, the human-annotated data is costly and non-scalable while the distantly supervised data contains many noises. In this paper, we propose an alternative approach to improve RE systems via enriching diverse expressions by relational paraphrase sentences. Based on an existing labeled data, we first automatically build a task-specific paraphrase data. Then, we propose a novel model to learn the information of diverse relation expressions. In our model, we try to capture this information on the paraphrases via a joint learning framework. Finally, we conduct experiments on a widely used dataset and the experimental results show that our approach is effective to improve the performance on relation extraction, even compared with a strong baseline.

pdf
Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
Tong Zhu | Haitao Wang | Junjie Yu | Xiabing Zhou | Wenliang Chen | Wei Zhang | Min Zhang
Proceedings of the 28th International Conference on Computational Linguistics

In recent years, distantly-supervised relation extraction has achieved a certain success by using deep neural networks. Distant Supervision (DS) can automatically generate large-scale annotated data by aligning entity pairs from Knowledge Bases (KB) to sentences. However, these DS-generated datasets inevitably have wrong labels that result in incorrect evaluation scores during testing, which may mislead the researchers. To solve this problem, we build a new dataset NYTH, where we use the DS-generated data as training data and hire annotators to label test data. Compared with the previous datasets, NYT-H has a much larger test set and then we can perform more accurate and consistent evaluation. Finally, we present the experimental results of several widely used systems on NYT-H. The experimental results show that the ranking lists of the comparison systems on the DS-labelled test data and human-annotated test data are different. This indicates that our human-annotated data is necessary for evaluation of distantly-supervised relation extraction.

2018

pdf
M-CNER: A Corpus for Chinese Named Entity Recognition in Multi-Domains
Qi Lu | YaoSheng Yang | Zhenghua Li | Wenliang Chen | Min Zhang
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning
Yaosheng Yang | Wenliang Chen | Zhenghua Li | Zhengqiu He | Min Zhang
Proceedings of the 27th International Conference on Computational Linguistics

A bottleneck problem with Chinese named entity recognition (NER) in new domains is the lack of annotated data. One solution is to utilize the method of distant supervision, which has been widely used in relation extraction, to automatically populate annotated training data without humancost. The distant supervision assumption here is that if a string in text is included in a predefined dictionary of entities, the string might be an entity. However, this kind of auto-generated data suffers from two main problems: incomplete and noisy annotations, which affect the performance of NER models. In this paper, we propose a novel approach which can partially solve the above problems of distant supervision for NER. In our approach, to handle the incomplete problem, we apply partial annotation learning to reduce the effect of unknown labels of characters. As for noisy annotation, we design an instance selector based on reinforcement learning to distinguish positive sentences from auto-generated annotations. In experiments, we create two datasets for Chinese named entity recognition in two domains with the help of distant supervision. The experimental results show that the proposed approach obtains better performance than the comparison systems on both two datasets.

2016

pdf
Active Learning for Dependency Parsing with Partial Annotation
Zhenghua Li | Min Zhang | Yue Zhang | Zhanyi Liu | Wenliang Chen | Hua Wu | Haifeng Wang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Distributed Representations for Building Profiles of Users and Items from Text Reviews
Wenliang Chen | Zhenjie Zhang | Zhenghua Li | Min Zhang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we propose an approach to learn distributed representations of users and items from text comments for recommendation systems. Traditional recommendation algorithms, e.g. collaborative filtering and matrix completion, are not designed to exploit the key information hidden in the text comments, while existing opinion mining methods do not provide direct support to recommendation systems with useful features on users and items. Our approach attempts to construct vectors to represent profiles of users and items under a unified framework to maximize word appearance likelihood. Then, the vector representations are used for a recommendation task in which we predict scores on unobserved user-item pairs without given texts. The recommendation-aware distributed representation approach is fully supported by effective and efficient learning algorithms over massive text archive. Our empirical evaluations on real datasets show that our system outperforms the state-of-the-art baseline systems.

2015

pdf
Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study
Zhenghua Li | Jiayuan Chao | Min Zhang | Wenliang Chen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf
Soft Cross-lingual Syntax Projection for Dependency Parsing
Zhenghua Li | Min Zhang | Wenliang Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Feature Embedding for Dependency Parsing
Wenliang Chen | Yue Zhang | Min Zhang
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Dependency Parsing: Past, Present, and Future
Wenliang Chen | Zhenghua Li | Min Zhang
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts

pdf
Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
Zhenghua Li | Min Zhang | Wenliang Chen
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf
Semi-Supervised Feature Transformation for Dependency Parsing
Wenliang Chen | Min Zhang | Yue Zhang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Fast and Accurate Shift-Reduce Constituent Parsing
Muhua Zhu | Yue Zhang | Wenliang Chen | Min Zhang | Jingbo Zhu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf
Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Wenliang Chen | Min Zhang | Haizhou Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf
Improving Chinese Word Segmentation and POS Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data
Yiou Wang | Jun’ichi Kazama | Yoshimasa Tsuruoka | Wenliang Chen | Yujie Zhang | Kentaro Torisawa
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
SMT Helps Bitext Dependency Parsing
Wenliang Chen | Jun’ichi Kazama | Min Zhang | Yoshimasa Tsuruoka | Yujie Zhang | Yiou Wang | Kentaro Torisawa | Haizhou Li
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Joint Models for Chinese POS Tagging and Dependency Parsing
Zhenghua Li | Min Zhang | Wanxiang Che | Ting Liu | Wenliang Chen | Haizhou Li
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf
Bitext Dependency Parsing with Bilingual Subtree Constraints
Wenliang Chen | Jun’ichi Kazama | Kentaro Torisawa
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Improving Graph-based Dependency Parsing with Decision History
Wenliang Chen | Jun’ichi Kazama | Yoshimasa Tsuruoka | Kentaro Torisawa
Coling 2010: Posters

2009

pdf
Semantic Dependency Parsing of NomBank and PropBank: An Efficient Integrated Approach via a Large-scale Feature Selection
Hai Zhao | Wenliang Chen | Chunyu Kit
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Improving Dependency Parsing with Subtrees from Auto-Parsed Data
Wenliang Chen | Jun’ichi Kazama | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Multilingual Dependency Learning: A Huge Feature Engineering Method to Semantic Dependency Parsing
Hai Zhao | Wenliang Chen | Chunyu Kit | Guodong Zhou
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf
Multilingual Dependency Learning: Exploiting Rich Features for Tagging Syntactic and Semantic Dependencies
Hai Zhao | Wenliang Chen | Jun’ichi Kazama | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

2008

pdf
Learning Reliable Information for Dependency Parsing Adaptation
Wenliang Chen | Youzheng Wu | Hitoshi Isahara
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf
Dependency Parsing with Short Dependency Relations in Unlabeled Data
Wenliang Chen | Daisuke Kawahara | Kiyotaka Uchimoto | Yujie Zhang | Hitoshi Isahara
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2007

pdf
A Two-Stage Parser for Multilingual Dependency Parsing
Wenliang Chen | Yujie Zhang | Hitoshi Isahara
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf
An Empirical Study of Chinese Chunking
Wenliang Chen | Yujie Zhang | Hitoshi Isahara
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
Chinese Named Entity Recognition with Conditional Random Fields
Wenliang Chen | Yujie Zhang | Hitoshi Isahara
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf
Using Multiple Discriminant Analysis Approach for Linear Text Segmentation
Jingbo Zhu | Na Ye | Xinzhi Chang | Wenliang Chen | Benjamin K Tsou
Second International Joint Conference on Natural Language Processing: Full Papers

pdf
Some Studies on Chinese Domain Knowledge Dictionary and Its Application to Text Classification
Jingbo Zhu | Wenliang Chen
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing