Hai-Son Le

Also published as: Hai-son Le, Hai Son Le

2013

The speech recognition and machine translation system of IOIT for IWSLT 2013
Ngoc-Quan Pham | Hai-Son Le | Tat-Thang Vu | Chi-Mai Luong
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the Automatic Speech Recognition (ASR) and Machine Translation (MT) systems developed by IOIT for the evaluation campaign of IWSLT2013. For the ASR task, using Kaldi toolkit, we developed the system based on weighted finite state transducer. The system is constructed by applying several techniques, notably, subspace Gaussian mixture models, speaker adaptation, discriminative training, system combination and SOUL, a neural network language model. The techniques used for automatic segmentation are also clarified. Besides, we compared different types of SOUL models in order to study the impact of words of previous sentences in predicting words in language modeling. For the MT task, the baseline system was built based on the open source toolkit N-code, then being augmented by using SOUL on top, i.e., in N-best rescoring phase.

pdf bib

2012

pdf bib

Continuous Space Translation Models with Neural Networks
Hai Son Le | Alexandre Allauzen | François Yvon
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib

Measuring the Influence of Long Range Dependencies with Neural Network Language Models
Hai Son Le | Alexandre Allauzen | François Yvon
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

pdf bib

pdf bib

2011

pdf bib abs

LIMSI’s experiments in domain adaptation for IWSLT11
Thomas Lavergne | Alexandre Allauzen | Hai-Son Le | François Yvon
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

LIMSI took part in the IWSLT 2011 TED task in the MT track for English to French using the in-house n-code system, which implements the n-gram based approach to Machine Translation. This framework not only allows to achieve state-of-the-art results for this language pair, but is also appealing due to its conceptual simplicity and its use of well understood statistical language models. Using this approach, we compare several ways to adapt our existing systems and resources to the TED task with mixture of language models and try to provide an analysis of the modest gains obtained by training a log linear combination of inand out-of-domain models.

pdf bib