Yang Xiang


2022

pdf
CLLE: A Benchmark for Continual Language Learning Evaluation in Multilingual Machine Translation
Han Zhang | Sheng Zhang | Yang Xiang | Bin Liang | Jinsong Su | Zhongjian Miao | Hui Wang | Ruifeng Xu
Findings of the Association for Computational Linguistics: EMNLP 2022

Continual Language Learning (CLL) in multilingual translation is inevitable when new languages are required to be translated. Due to the lack of unified and generalized benchmarks, the evaluation of existing methods is greatly influenced by experimental design which usually has a big gap from the industrial demands. In this work, we propose the first Continual Language Learning Evaluation benchmark CLLE in multilingual translation. CLLE consists of a Chinese-centric corpus — CN-25 and two CLL tasks — the close-distance language continual learning task and the language family continual learning task designed for real and disparate demands. Different from existing translation benchmarks, CLLE considers several restrictions for CLL, including domain distribution alignment, content overlap, language diversity, and the balance of corpus. Furthermore, we propose a novel framework COMETA based on Constrained Optimization and META-learning to alleviate catastrophic forgetting and dependency on history training data by using a meta-model to retain the important parameters for old languages. Our experiments prove that CLLE is a challenging CLL benchmark and that our proposed method is effective when compared with other strong baselines. Due to the construction of the corpus, the task designing and the evaluation method are independent of the centric language, we also construct and release the English-centric corpus EN-25 to facilitate academic research.

2020

pdf
Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text
Shengbin Jia | Ling Ding | Xiaojun Chen | Shijia E | Yang Xiang
Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media

Chinese word segmentation is necessary to provide word-level information for Chinese named entity recognition (NER) systems. However, segmentation error propagation is a challenge for Chinese NER while processing colloquial data like social media text. In this paper, we propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text, especially by leveraging uncertain information of word segmentation. Such ambiguous information contains all the potential segmentation states of a sentence that provides a channel for the model to infer deep word-level characteristics. We propose a trilogy (i.e., Candidate Position Embedding => Position Selective Attention => Adaptive Word Convolution) to encode uncertain word segmentation information and acquire appropriate word-level representation. Experimental results on the social media corpus show that our model alleviates the segmentation error cascading trouble effectively, and achieves a significant performance improvement of 2% over previous state-of-the-art methods.

2019

pdf
Naive Bayes and BiLSTM Ensemble for Discriminating between Mainland and Taiwan Variation of Mandarin Chinese
Li Yang | Yang Xiang
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

Automatic dialect identification is a more challengingctask than language identification, as it requires the ability to discriminate between varieties of one language. In this paper, we propose an ensemble based system, which combines traditional machine learning models trained on bag of n-gram fetures, with deep learning models trained on word embeddings, to solve the Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT) shared task at VarDial 2019. Our experiments show that a character bigram-trigram combination based Naive Bayes is a very strong model for identifying varieties of Mandarin Chinense. Through further ensemble of Navie Bayes and BiLSTM, our system (team: itsalexyang) achived an macro-averaged F1 score of 0.8530 and 0.8687 in two tracks.

2016

pdf
Incorporating Label Dependency for Answer Quality Tagging in Community Question Answering via CNN-LSTM-CRF
Yang Xiang | Xiaoqiang Zhou | Qingcai Chen | Zhihui Zheng | Buzhou Tang | Xiaolong Wang | Yang Qin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In community question answering (cQA), the quality of answers are determined by the matching degree between question-answer pairs and the correlation among the answers. In this paper, we show that the dependency between the answer quality labels also plays a pivotal role. To validate the effectiveness of label dependency, we propose two neural network-based models, with different combination modes of Convolutional Neural Net-works, Long Short Term Memory and Conditional Random Fields. Extensive experi-ments are taken on the dataset released by the SemEval-2015 cQA shared task. The first model is a stacked ensemble of the networks. It achieves 58.96% on macro averaged F1, which improves the state-of-the-art neural network-based method by 2.82% and outper-forms the Top-1 system in the shared task by 1.77%. The second is a simple attention-based model whose input is the connection of the question and its corresponding answers. It produces promising results with 58.29% on overall F1 and gains the best performance on the Good and Bad categories.

2015

pdf
ICRC-HIT: A Deep Learning based Comment Sequence Labeling System for Answer Selection Challenge
Xiaoqiang Zhou | Baotian Hu | Jiaxin Lin | Yang Xiang | Xiaolong Wang
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf
Chinese Grammatical Error Diagnosis Using Ensemble Learning
Yang Xiang | Xiaolong Wang | Wenying Han | Qinghua Hong
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

2014

pdf
Problematic Situation Analysis and Automatic Recognition for Chinese Online Conversational System
Yang Xiang | Yaoyun Zhang | Xiaoqiang Zhou | Xiaolong Wang | Yang Qin
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

2013

pdf
A Hybrid Model For Grammatical Error Correction
Yang Xiang | Bo Yuan | Yaoyun Zhang | Xiaolong Wang | Wen Zheng | Chongqiang Wei
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

pdf
Grammatical Error Correction Using Feature Selection and Confidence Tuning
Yang Xiang | Yaoyun Zhang | Xiaolong Wang | Chongqiang Wei | Wen Zheng | Xiaoqiang Zhou | Yuxiu Hu | Yang Qin
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf
A Mixed Deterministic Model for Coreference Resolution
Bo Yuan | Qingcai Chen | Yang Xiang | Xiaolong Wang | Liping Ge | Zengjian Liu | Meng Liao | Xianbo Si
Joint Conference on EMNLP and CoNLL - Shared Task