Kun Wang


2021

pdf bib
A Comparison between Pre-training and Large-scale Back-translation for Neural Machine Translation
Dandan Huang | Kun Wang | Yue Zhang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
What Have We Achieved on Text Summarization?
Dandan Huang | Leyang Cui | Sen Yang | Guangsheng Bao | Kun Wang | Jun Xie | Yue Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years. However, gaps still exist between summaries produced by automatic summarizers and human professionals. Aiming to gain more understanding of summarization systems with respect to their strengths and limits on a fine-grained syntactic and semantic level, we consult the Multidimensional Quality Metric (MQM) and quantify 8 major sources of errors on 10 representative summarization models manually. Primarily, we find that 1) under similar settings, extractive summarizers are in general better than their abstractive counterparts thanks to strength in faithfulness and factual-consistency; 2) milestone techniques such as copy, coverage and hybrid extractive/abstractive methods do bring specific improvements but also demonstrate limitations; 3) pre-training techniques, and in particular sequence-to-sequence pre-training, are highly effective for improving text summarization, with BART giving the best results.

2019

pdf bib
Code-Switching for Enhancing NMT with Pre-Specified Translation
Kai Song | Yue Zhang | Heng Yu | Weihua Luo | Kun Wang | Min Zhang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Leveraging user-provided translation to constrain NMT has practical significance. Existing methods can be classified into two main categories, namely the use of placeholder tags for lexicon words and the use of hard constraints during decoding. Both methods can hurt translation fidelity for various reasons. We investigate a data augmentation method, making code-switched training data by replacing source phrases with their target translations. Our method does not change the MNT model or decoding algorithm, allowing the model to learn lexicon translations by copying source-side target words. Extensive experiments show that our method achieves consistent improvements over existing approaches, improving translation of constrained words without hurting unconstrained words.

2015

pdf bib
Well-Formed Dependency to String translation with BTG Grammar
Xiaoqing Li | Kun Wang | Dakun Zhang | Jie Hao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
Dynamically Integrating Cross-Domain Translation Memory into Phrase-Based Machine Translation during Decoding
Kun Wang | Chengqing Zong | Keh-Yih Su
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Knowledge Sharing via Social Login: Exploiting Microblogging Service for Warming up Social Question Answering Websites
Yang Xiao | Wayne Xin Zhao | Kun Wang | Zhen Xiao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
Kun Wang | Chengqing Zong | Keh-Yih Su
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Integrating Surface and Abstract Features for Robust Cross-Domain Chinese Word Segmentation
Xiaoqing Li | Kun Wang | Chengqing Zong | Keh-Yih Su
Proceedings of COLING 2012

2010

pdf bib
A Character-Based Joint Model for Chinese Word Segmentation
Kun Wang | Chengqing Zong | Keh-Yih Su
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
A Character-Based Joint Model for CIPS-SIGHAN Word Segmentation Bakeoff 2010
Kun Wang | Chengqing Zong | Keh-Yih Su
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2009

pdf bib
Which is More Suitable for Chinese Word Segmentation, the Generative Model or the Discriminative One?
Kun Wang | Chengqing Zong | Keh-Yih Su
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2