Xuancong Wang


2018

pdf
Named-Entity Tagging and Domain adaptation for Better Customized Translation
Zhongwei Li | Xuancong Wang | Ai Ti Aw | Eng Siong Chng | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

Customized translation need pay spe-cial attention to the target domain ter-minology especially the named-entities for the domain. Adding linguistic features to neural machine translation (NMT) has been shown to benefit translation in many studies. In this paper, we further demonstrate that adding named-entity (NE) feature with named-entity recognition (NER) into the source language produces better translation with NMT. Our experiments show that by just including the different NE classes and boundary tags, we can increase the BLEU score by around 1 to 2 points using the standard test sets from WMT2017. We also show that adding NE tags using NER and applying in-domain adaptation can be combined to further improve customized machine translation.

2016

pdf
A Word Labeling Approach to Thai Sentence Boundary Detection and POS Tagging
Nina Zhou | AiTi Aw | Nattadaporn Lertcheva | Xuancong Wang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Previous studies on Thai Sentence Boundary Detection (SBD) mostly assumed sentence ends at a space disambiguation problem, which classified space either as an indicator for Sentence Boundary (SB) or non-Sentence Boundary (nSB). In this paper, we propose a word labeling approach which treats space as a normal word, and detects SB between any two words. This removes the restriction for SB to be oc-curred only at space and makes our system more robust for modern Thai writing. It is because in modern Thai writing, space is not consistently used to indicate SB. As syntactic information contributes to better SBD, we further propose a joint Part-Of-Speech (POS) tagging and SBD framework based on Factorial Conditional Random Field (FCRF) model. We compare the performance of our proposed ap-proach with reported methods on ORCHID corpus. We also performed experiments of FCRF model on the TaLAPi corpus. The results show that the word labelling approach has better performance than pre-vious space-based classification approaches and FCRF joint model outperforms LCRF model in terms of SBD in all experiments.

2014

pdf
Combining Punctuation and Disfluency Prediction: An Empirical Study
Xuancong Wang | Khe Chai Sim | Hwee Tou Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
A Beam-Search Decoder for Disfluency Detection
Xuancong Wang | Hwee Tou Ng | Khe Chai Sim
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers