Muyun Yang

Also published as: Mu-yun Yang, MuYun Yang


2023

pdf
Improving Translation Quality Estimation with Bias Mitigation
Hui Huang | Shuangzhi Wu | Kehai Chen | Hui Di | Muyun Yang | Tiejun Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

State-of-the-art translation Quality Estimation (QE) models are proven to be biased. More specifically, they over-rely on monolingual features while ignoring the bilingual semantic alignment. In this work, we propose a novel method to mitigate the bias of the QE model and improve estimation performance. Our method is based on the contrastive learning between clean and noisy sentence pairs. We first introduce noise to the target side of the parallel sentence pair, forming the negative samples. With the original parallel pairs as the positive sample, the QE model is contrastively trained to distinguish the positive samples from the negative ones. This objective is jointly trained with the regression-style quality estimation, so as to prevent the QE model from overfitting to monolingual features. Experiments on WMT QE evaluation datasets demonstrate that our method improves the estimation performance by a large margin while mitigating the bias.

pdf
Iterative Nearest Neighbour Machine Translation for Unsupervised Domain Adaptation
Hui Huang | Shuangzhi Wu | Xinnian Liang | Zefan Zhou | Muyun Yang | Tiejun Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Unsupervised domain adaptation of machine translation, which adapts a pre-trained translation model to a specific domain without in-domain parallel data, has drawn extensive attention in recent years. However, most existing methods focus on the fine-tuning based techniques, which is non-extensible. In this paper, we propose a new method to perform unsupervised domain adaptation in a non-parametric manner. Our method only resorts to in-domain monolingual data, and we jointly perform nearest neighbour inference on both forward and backward translation directions. The forward translation model creates nearest neighbour datastore for the backward direction, and vice versa, strengthening each other in an iterative style. Experiments on multi-domain datasets demonstrate that our method significantly improves the in-domain translation performance and achieves state-of-the-art results among non-parametric methods.

2022

pdf
中文专利关键信息语料库的构建研究(Research on the construction of Chinese patent key information corpus)
Wenting Zhang (张文婷) | Meihan Zhao (赵美含) | Yixuan Ma (马翊轩) | Wenrui Wang (王文瑞) | Yuzhe Liu (刘宇哲) | Muyun Yang (杨沐昀)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“专利文献是一种重要的技术文献,是知识产权强国的重要工作内容。目前专利语料库多集中于信息检索、机器翻译以及文本文分类等领域,尚缺乏更细粒度的标注,不足以支持问答、阅读理解等新形态的人工智能技术研发。本文面向专利智能分析的需要,提出了从解决问题、技术手段、效果三个角度对发明专利进行专利标注,并最终构建了包含313篇的中文专利关键信息语料库。利用命名实体识别技术对语料库关键信息进行识别和验证,表明专利关键信息的识别是不同于领域命名实体识别的更大粒度的信息抽取难题。”

2021

pdf
Grammar-Based Patches Generation for Automated Program Repair
Yu Tang | Long Zhou | Ambrosio Blanco | Shujie Liu | Furu Wei | Ming Zhou | Muyun Yang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
End-to-End Speech Translation with Adversarial Training
Xuancai Li | Chen Kehai | Tiejun Zhao | Muyun Yang
Proceedings of the First Workshop on Automatic Simultaneous Translation

End-to-End speech translation usually leverages audio-to-text parallel data to train an available speech translation model which has shown impressive results on various speech translation tasks. Due to the artificial cost of collecting audio-to-text parallel data, the speech translation is a natural low-resource translation scenario, which greatly hinders its improvement. In this paper, we proposed a new adversarial training method to leverage target monolingual data to relieve the low-resource shortcoming of speech translation. In our method, the existing speech translation model is considered as a Generator to gain a target language output, and another neural Discriminator is used to guide the distinction between outputs of speech translation model and true target monolingual sentences. Experimental results on the CCMT 2019-BSTC dataset speech translation task demonstrate that the proposed methods can significantly improve the performance of the End-to-End speech translation system.

pdf
Robust Machine Reading Comprehension by Learning Soft labels
Zhenyu Zhao | Shuangzhi Wu | Muyun Yang | Kehai Chen | Tiejun Zhao
Proceedings of the 28th International Conference on Computational Linguistics

Neural models have achieved great success on the task of machine reading comprehension (MRC), which are typically trained on hard labels. We argue that hard labels limit the model capability on generalization due to the label sparseness problem. In this paper, we propose a robust training method for MRC models to address this problem. Our method consists of three strategies, 1) label smoothing, 2) word overlapping, 3) distribution prediction. All of them help to train models on soft labels. We validate our approach on the representative architecture - ALBERT. Experimental results show that our method can greatly boost the baseline with 1% improvement in average, and achieve state-of-the-art performance on NewsQA and QUOREF.

2017

pdf
Investigating the content and form of referring expressions in Mandarin: introducing the Mtuna corpus
Kees van Deemter | Le Sun | Rint Sybesma | Xiao Li | Bo Chen | Muyun Yang
Proceedings of the 10th International Conference on Natural Language Generation

East Asian languages are thought to handle reference differently from languages such as English, particularly in terms of the marking of definiteness and number. We present the first Data-Text corpus for Referring Expressions in Mandarin, and we use this corpus to test some initial hypotheses inspired by the theoretical linguistics literature. Our findings suggest that function words deserve more attention in Referring Expressions Generation than they have so far received, and they have a bearing on the debate about whether different languages make different trade-offs between clarity and brevity.

2015

pdf
Hierarchical Recurrent Neural Network for Document Modeling
Rui Lin | Shujie Liu | Muyun Yang | Mu Li | Ming Zhou | Sheng Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf
Learning Topic Representation for SMT with Neural Networks
Lei Cui | Dongdong Zhang | Shujie Liu | Qiming Chen | Mu Li | Ming Zhou | Muyun Yang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf
A Hierarchical Semantics-Aware Distributional Similarity Scheme
Shuqi Sun | Ke Sun | Shiqi Zhao | Haifeng Wang | Muyun Yang | Sheng Li
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Repairing Incorrect Translation with Examples
Junguo Zhu | Muyun Yang | Sheng Li | Tiejun Zhao
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2011

pdf
Harvesting Related Entities with a Search Engine
Shuqi Sun | Shiqi Zhao | Muyun Yang | Haifeng Wang | Sheng Li
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
Reexamination on Potential for Personalization in Web Search
Daren Li | Muyun Yang | HaoLiang Qi | Sheng Li | Tiejun Zhao
Coling 2010: Posters

pdf
Utilizing Variability of Time and Term Content, within and across Users in Session Detection
Shuqi Sun | Sheng Li | Muyun Yang | Haoliang Qi | Tiejun Zhao
Coling 2010: Posters

pdf
All in Strings: a Powerful String-based Automatic MT Evaluation Metric with Multiple Granularities
Junguo Zhu | Muyun Yang | Bo Wang | Sheng Li | Tiejun Zhao
Coling 2010: Posters

2009

pdf
A Statistical Machine Translation Model Based on a Synthetic Synchronous Grammar
Hongfei Jiang | Muyun Yang | Tiejun Zhao | Sheng Li | Bo Wang
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
References Extension for the Automatic Evaluation of MT by Syntactic Hybridization
Bo Wang | Tiejun Zhao | Muyun Yang | Sheng Li
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

pdf
A Study of Translation Rule Classification for Syntax-based Statistical Machine Translation
Hongfei Jiang | Sheng Li | Muyun Yang | Tiejun Zhao
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

2007

pdf
HIT-WSD: Using Search Engine for Multilingual Chinese-English Lexical Sample Task
PengYuan Liu | TieJun Zhao | MuYun Yang
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2002

pdf
Learning Chinese Bracketing Knowledge Based on a Bilingual Language Model
Yajuan Lü | Sheng Li | Tiejun Zhao | Muyun Yang
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf
Automatic Detection of Prosody Phrase Boundaries for Text-to-Speech System
Xin Lv | Tie-jun Zhao | Zhan-yi Liu | Mu-yun Yang
Proceedings of the Seventh International Workshop on Parsing Technologies

2000

pdf
Statistics Based Hybrid Approach to Chinese Base Phrase Identification
Tie-jun Zhao | Mu-yun Yang | Fang Liu | Jian-min Yao | Hao Yu
Second Chinese Language Processing Workshop