Xiaolin Wang


2019

pdf bib
Online Sentence Segmentation for Simultaneous Interpretation using Multi-Shifted Recurrent Neural Network
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of Machine Translation Summit XVII: Research Track

2018

pdf bib
CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

This paper presents an open-source neural machine translation toolkit named CytonMT. The toolkit is built from scratch only using C++ and NVIDIA’s GPU-accelerated libraries. The toolkit features training efficiency, code simplicity and translation quality. Benchmarks show that cytonMT accelerates the training speed by 64.5% to 110.8% on neural networks of various sizes, and achieves competitive translation quality.

2016

pdf bib
Target-Bidirectional Neural Models for Machine Transliteration
Andrew Finch | Lemao Liu | Xiaolin Wang | Eiichiro Sumita
Proceedings of the Sixth Named Entity Workshop

pdf bib
An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation
Xiaolin Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine. The key problem is how to segment the stream in an online manner into units suitable for translation. The segmentation process proceeds by calculating a confidence score for each word that indicates the soundness of placing a sentence boundary after it, and then heuristics are employed to determine the position of the boundaries. Multiple variants of the confidence scoring method and segmentation heuristics were studied. Experimental results show that the best performing strategy is not only efficient in terms of average latency per word, but also achieved end-to-end translation quality close to an offline baseline, and close to oracle segmentation.

pdf bib
A Prototype Automatic Simultaneous Interpretation System
Xiaolin Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Simultaneous interpretation allows people to communicate spontaneously across language boundaries, but such services are prohibitively expensive for the general public. This paper presents a fully automatic simultaneous interpretation system to address this problem. Though the development is still at an early stage, the system is capable of keeping up with the fastest of the TED speakers while at the same time delivering high-quality translations. We believe that the system will become an effective tool for facilitating cross-lingual communication in the future.

2015

pdf bib
Neural Network Transduction Models in Transliteration Generation
Andrew Finch | Lemao Liu | Xiaolin Wang | Eiichiro Sumita
Proceedings of the Fifth Named Entity Workshop

pdf bib
Hierarchical Phrase-based Stream Decoding
Andrew Finch | Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Leave-one-out Word Alignment without Garbage Collector Effects
Xiaolin Wang | Masao Utiyama | Andrew Finch | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
The NICT translation system for IWSLT 2014
Xiaolin Wang | Andrew Finch | Masao Utiyama | Taro Watanabe | Eiichiro Sumita
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes NICT’s participation in the IWSLT 2014 evaluation campaign for the TED Chinese-English translation shared-task. Our approach used a combination of phrase-based and hierarchical statistical machine translation (SMT) systems. Our focus was in several areas, specifically system combination, word alignment, and various language modeling techniques including the use of neural network joint models. Our experiments on the test set from the 2013 shared task, showed that an improvement in BLEU score can be gained in translation performance through all of these techniques, with the largest improvements coming from using large data sizes to train the language model.

pdf bib
An exploration of segmentation strategies in stream decoding
Andrew Finch | Xiaolin Wang | Eiichiro Sumita
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

In this paper we explore segmentation strategies for the stream decoder a method for decoding from a continuous stream of input tokens, rather than the traditional method of decoding from sentence segmented text. The behavior of the decoder is analyzed and modifications to the decoding algorithm are proposed to improve its performance. The experimental results show our proposed decoding strategies to be effective, and add support to the original findings that this approach is capable of approaching the performance of the underlying phrase-based machine translation decoder, at useful levels of latency. Our experiments evaluated the stream decoder on a broader set of language pairs than in previous work. We found most European language pairs were similar in character, and report results on English-Chinese and English-German pairs which are of interest due to the reordering required.

pdf bib
Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora
Xiaolin Wang | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation
Xiaolin Wang | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Labeled Alignment for Recognizing Textual Entailment
Xiaolin Wang | Hai Zhao | Bao-Liang Lu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Spell Checking for Chinese
Shaohua Yang | Hai Zhao | Xiaolin Wang | Bao-liang Lu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents some novel results on Chinese spell checking. In this paper, a concise algorithm based on minimized-path segmentation is proposed to reduce the cost and suit the needs of current Chinese input systems. The proposed algorithm is actually derived from a simple assumption that spelling errors often make the number of segments larger. The experimental results are quite positive and implicitly verify the effectiveness of the proposed assumption. Finally, all approaches work together to output a result much better than the baseline with 12% performance improvement.

2008

pdf bib
Cross Language Text Categorization Using a Bilingual Lexicon
Ke Wu | Xiaolin Wang | Bao-Liang Lu
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I