2013
pdf
abs
The NICT ASR system for IWSLT 2013
Chien-Lin Huang
|
Paul R. Dixon
|
Shigeki Matsuda
|
Youzheng Wu
|
Xugang Lu
|
Masahiro Saiko
|
Chiori Hori
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
This study presents the NICT automatic speech recognition (ASR) system submitted for the IWSLT 2013 ASR evaluation. We apply two types of acoustic features and three types of acoustic models to the NICT ASR system. Our system is comprised of six subsystems with different acoustic features and models. This study reports the individual results and fusion of systems and highlights the improvements made by our proposed methods that include the automatic segmentation of audio data, language model adaptation, speaker adaptive training of deep neural network models, and the NICT SprinTra decoder. Our experimental results indicated that our proposed methods offer good performance improvements on lecture speech recognition tasks. Our results denoted a 13.5% word error rate on the IWSLT 2013 ASR English test data set.
2012
pdf
Rescoring a Phrase-based Machine Transliteration System with Recurrent Neural Network Language Models
Andrew Finch
|
Paul Dixon
|
Eiichiro Sumita
Proceedings of the 4th Named Entity Workshop (NEWS) 2012
pdf
bib
abs
The NICT ASR system for IWSLT2012
Hitoshi Yamamoto
|
Youzheng Wu
|
Chien-Lin Huang
|
Xugang Lu
|
Paul R. Dixon
|
Shigeki Matsuda
|
Chiori Hori
|
Hideki Kashioka
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes our automatic speech recognition (ASR) system for the IWSLT 2012 evaluation campaign. The target data of the campaign is selected from the TED talks, a collection of public speeches on a variety of topics spoken in English. Our ASR system is based on weighted finite-state transducers and exploits an combination of acoustic models for spontaneous speech, language models based on n-gram and factored recurrent neural network trained with effectively selected corpora, and unsupervised topic adaptation framework utilizing ASR results. Accordingly, the system achieved 10.6% and 12.0% word error rate for the tst2011 and tst2012 evaluation set, respectively.
2011
pdf
bib
abs
The NICT ASR system for IWSLT2011
Kazuhiko Abe
|
Youzheng Wu
|
Chien-lin Huang
|
Paul R. Dixon
|
Shigeki Matsuda
|
Chiori Hori
|
Hideki Kashioka
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we describe NICT’s participation in the IWSLT 2011 evaluation campaign for the ASR Track. To recognize spontaneous speech, we prepared an acoustic model trained by more spontaneous speech corpora and a language model constructed with text corpora distributed by the organizer. We built the multi-pass ASR system by adapting the acoustic and language models with previous ASR results. The target speech was selected from talks on the TED (Technology, Entertainment, Design) program. Here, a large reduction in word error rate was obtained by the speaker adaptation of the acoustic model with MLLR. Additional improvement was achieved not only by adaptation of the language model but also by parallel usage of the baseline and speaker-dependent acoustic models. Accordingly, the final WER was reduced by 30% from the baseline ASR for the distributed test set.
pdf
abs
Investigation of the effects of ASR tuning on speech translation performance
Paul R. Dixon
|
Andrew Finch
|
Chiori Hori
|
Hideki Kashioka
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper we describe some of our recent investigations into ASR and SMT coupling issues from an ASR perspective. Our study was motivated by several areas: Firstly, to understand how standard ASR tuning procedures effect the SMT performance and whether it is safe to perform this tuning in isolation. Secondly, to investigate how vocabulary and segmentation mismatches between the ASR and SMT system effect the performance. Thirdly, to uncover any practical issues that arise when using a WFST based speech decoder for tight coupling as opposed to a more traditional tree-search decoding architecture. On the IWSLT07 Japanese-English task we found that larger language model weights only helped the SMT performance when the ASR decoder was tuned in a sub-optimal manner. When we considered the performance with suitable wide beams that ensured the ASR accuracy had converged we observed the language model weight had little influence on the SMT BLEU scores. After the construction of the phrase table the actual SMT vocabulary can be less than the training data vocabulary. By reducing the ASR lexicon to only cover the words the SMT system could accept, we found this lead to an increase in the ASR error rates, however the SMT BLEU scores were nearly unchanged. From a practical point of view this is a useful result as it means we can significantly reduce the memory footprint of the ASR system. We also investigated coupling WFST based ASR to a simple WFST based translation decoder and found it was crucial to perform phrase table expansion to avoid OOV problems. For the WFST translation decoder we describe a semiring based approach for optimizing the log-linear weights.
pdf
bib
Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT
Michael Paul
|
Andrew Finch
|
Paul R. Dixon
|
Eiichiro Sumita
Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
pdf
Integrating Models Derived from non-Parametric Bayesian Co-segmentation into a Statistical Machine Transliteration System
Andrew Finch
|
Paul Dixon
|
Eiichiro Sumita
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)
2010
pdf
Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm
Dong Yang
|
Paul Dixon
|
Sadaoki Furui
Proceedings of the ACL 2010 Conference Short Papers
2009
pdf
Combining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration
Dong Yang
|
Paul Dixon
|
Yi-Cheng Pan
|
Tasuku Oonishi
|
Masanobu Nakamura
|
Sadaoki Furui
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)