Loic Dugast

Also published as: Loïc Dugast


Investigating automatic and manual filtering methods to produce MT-ready glossaries from existing ones
Maria Afara | Randy Scansani | Loïc Dugast
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Commercial Machine Translation (MT) providers offer functionalities that allow users to leverage bilingual glossaries. This poses the question of how to turn glossaries that were intended to be used by a human translator into MT-ready ones, removing entries that could harm the MT output. We present two automatic filtering approaches - one based on rules and the second one relying on a translation memory - and a manual filtering procedure carried out by a linguist. The resulting glossaries are added to the MT model. The outputs are compared against a baseline where no glossary is used and an output produced using the original glossary. The present work aims at investigating if any of these filtering methods can bring a higher terminology accuracy without negative effects on the overall quality. Results are measured with terminology accuracy and Translation Edit Rate. We test our filters on two language pairs, En-Fr and De-En. Results show that some of the automatically filtered glossaries improve the output when compared to the baseline, and they may help reach a better balance between accuracy and overall quality, replacing the costly manual process without quality loss.


Glossary functionality in commercial machine translation: does it help? A first step to identify best practices for a language service provider
Randy Scansani | Loïc Dugast
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

Recently, a number of commercial Machine Translation (MT) providers have started to offer glossary features allowing users to enforce terminology into the output of a generic model. However, to the best of our knowledge it is not clear how such features would impact terminology accuracy and the overall quality of the output. The present contribution aims at providing a first insight into the performance of the glossary-enhanced generic models offered by four providers. Our tests involve two different domains and language pairs, i.e. Sportswear En–Fr and Industrial Equipment De–En. The output of each generic model and of the glossaryenhanced one will be evaluated relying on Translation Error Rate (TER) to take into account the overall output quality and on accuracy to assess the compliance with the glossary. This is followed by a manual evaluation. The present contribution mainly focuses on understanding how these glossary features can be fruitfully exploited by language service providers (LSPs), especially in a scenario in which a customer glossary is already available and is added to the generic model as is.


pdf bib
Building a Better Bitext for Structurally Different Languages through Self-training
Jungyeul Park | Loïc Dugast | Jeen-Pyo Hong | Chang-Uk Shin | Jeong-Won Cha
Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora

We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment.


Selective addition of corpus-extracted phrasal lexical rules to a rule-based machine translation system
Loic Dugast | Jean Senellart | Philipp Koehn
Proceedings of Machine Translation Summit XII: Posters

Statistical Post Editing and Dictionary Extraction: Systran/Edinburgh Submissions for ACL-WMT2009
Loic Dugast | Jean Senellart | Philipp Koehn
Proceedings of the Fourth Workshop on Statistical Machine Translation


Tighter Integration of Rule-Based and Statistical MT in Serial System Combination
Nicola Ueffing | Jens Stephan | Evgeny Matusov | Loïc Dugast | George Foster | Roland Kuhn | Jean Senellart | Jin Yang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

Can we Relearn an RBMT System?
Loïc Dugast | Jean Senellart | Philipp Koehn
Proceedings of the Third Workshop on Statistical Machine Translation


Statistical Post-Editing on SYSTRAN‘s Rule-Based Translation System
Loïc Dugast | Jean Senellart | Philipp Koehn
Proceedings of the Second Workshop on Statistical Machine Translation