Loïc Barrault

Also published as: Loic Barrault

2021

pdf bib abs
In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering
Peter Vickers | Nikolaos Aletras | Emilio Monti | Loïc Barrault
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Visual Question Answering (VQA) methods aim at leveraging visual input to answer questions that may require complex reasoning over entities. Current models are trained on labelled data that may be insufficient to learn complex knowledge representations. In this paper, we propose a new method to enhance the reasoning capabilities of a multi-modal pretrained model (Vision+Language BERT) by integrating facts extracted from an external knowledge base. Evaluation on the KVQA dataset benchmark demonstrates that our method outperforms competitive baselines by 19%, achieving new state-of-the-art results. We also perform an extensive analysis highlighting the limitations of our best performing model through an ablation study.

pdf bib abs
Active Learning by Acquiring Contrastive Examples
Katerina Margatina | Giorgos Vernikos | Loïc Barrault | Nikolaos Aletras
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting contrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

2020

pdf bib abs
Évaluation de systèmes apprenant tout au long de la vie (Evaluation of lifelong learning systems )
Yevhenii Prokopalo | Sylvain Meignier | Olivier Galibert | Loïc Barrault | Anthony Larcher
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

Aujourd’hui les systèmes intelligents obtiennent d’excellentes performances dans de nombreux domaines lorsqu’ils sont entraînés par des experts en apprentissage automatique. Lorsque ces systèmes sont mis en production, leurs performances se dégradent au cours du temps du fait de l’évolution de leur environnement réel. Une adaptation de leur modèle par des experts en apprentissage automatique est possible mais très coûteuse alors que les sociétés utilisant ces systèmes disposent d’experts du domaine qui pourraient accompagner ces systèmes dans un apprentissage tout au long de la vie. Dans cet article nous proposons un cadre d’évaluation générique pour des systèmes apprenant tout au long de la vie (SATLV). Nous proposons d’évaluer l’apprentissage assisté par l’humain (actif ou interactif) et l’apprentissage au cours du temps.

pdf bib
Traduction automatique pour la normalisation du français du XVIIe siècle ()
Simon Gabay | Loïc Barrault
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

pdf bib abs
Simultaneous Machine Translation with Visual Context
Ozan Caglayan | Julia Ive | Veneta Haralampieva | Pranava Madhyastha | Loïc Barrault | Lucia Specia
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible. The translation thus has to start with an incomplete source text, which is read progressively, creating the need for anticipation. In this paper, we seek to understand whether the addition of visual information can compensate for the missing source context. To this end, we analyse the impact of different multimodal approaches and visual features on state-of-the-art SiMT frameworks. Our results show that visual context is helpful and that visually-grounded models based on explicit object region information are much better than commonly used global features, reaching up to 3 BLEU points improvement under low latency scenarios. Our qualitative analysis illustrates cases where only the multimodal systems are able to translate correctly from English into gender-marked languages, as well as deal with differences in word order, such as adjective-noun placement between English and French.

This paper presents the results of the news translation task and the similar language translation task, both organised alongside the Conference on Machine Translation (WMT) 2020. In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories. The task was also opened up to additional test suites to probe specific aspects of translation. In the similar language translation task, participants built machine translation systems for translating between closely related pairs of languages.

pdf bib abs
Findings of the First Shared Task on Lifelong Learning Machine Translation
Loïc Barrault | Magdalena Biesialska | Marta R. Costa-jussà | Fethi Bougares | Olivier Galibert
Proceedings of the Fifth Conference on Machine Translation

A lifelong learning system can adapt to new data without forgetting previously acquired knowledge. In this paper, we introduce the first benchmark for lifelong learning machine translation. For this purpose, we provide training, lifelong and test data sets for two language pairs: English-German and English-French. Additionally, we report the results of our baseline systems, which we make available to the public. The goal of this shared task is to encourage research on the emerging topic of lifelong learning machine translation.

pdf bib abs
Evaluation of Lifelong Learning Systems
Yevhenii Prokopalo | Sylvain Meignier | Olivier Galibert | Loic Barrault | Anthony Larcher
Proceedings of the 12th Language Resources and Evaluation Conference

Current intelligent systems need the expensive support of machine learning experts to sustain their performance level when used on a daily basis. To reduce this cost, i.e. remaining free from any machine learning expert, it is reasonable to implement lifelong (or continuous) learning intelligent systems that will continuously adapt their model when facing changing execution conditions. In this work, the systems are allowed to refer to human domain experts who can provide the system with relevant knowledge about the task. Nowadays, the fast growth of lifelong learning systems development rises the question of their evaluation. In this article we propose a generic evaluation methodology for the specific case of lifelong learning systems. Two steps will be considered. First, the evaluation of human-assisted learning (including active and/or interactive learning) outside the context of lifelong learning. Second, the system evaluation across time, with propositions of how a lifelong learning intelligent system should be evaluated when including human assisted learning or not.

2019

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation.

pdf bib abs
LIUM’s Contributions to the WMT2019 News Translation Task: Data and Systems for German-French Language Pairs
Fethi Bougares | Jane Wottawa | Anne Baillot | Loïc Barrault | Adrien Bardet
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the neural machine translation (NMT) systems of the LIUM Laboratory developed for the French↔German news translation task of the Fourth Conference onMachine Translation (WMT 2019). The chosen language pair is included for the first time in the WMT news translation task. We de-scribe how the training and the evaluation data was created. We also present our participation in the French↔German translation directions using self-attentional Transformer networks with small and big architectures.

pdf bib abs
Étude de l’apprentissage par transfert de systèmes de traduction automatique neuronaux (Study on transfer learning in neural machine translation )
Adrien Bardet | Fethi Bougares | Loïc Barrault
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

L’apprentissage par transfert est une solution au problème de l’apprentissage de systèmes de traduction automatique neuronaux pour des paires de langues peu dotées. Dans cet article, nous proposons une analyse de cette méthode. Nous souhaitons évaluer l’impact de la quantité de données et celui de la proximité des langues impliquées pour obtenir le meilleur transfert possible. Nous prenons en compte ces deux paramètres non seulement pour une tâche de traduction “classique” mais également lorsque les corpus de données font défaut. Enfin, il s’agit de proposer une approche où volume de données et proximité des langues sont combinées afin de ne plus avoir à trancher entre ces deux éléments.

The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech. For the first two tasks we encouraged submissions of end- to-end speech-to-text systems, and for the second task participants could also use the video as additional input. We received submissions by 12 research teams. This overview provides detailed descriptions of the data and evaluation conditions of each task and reports results of the participating systems.

pdf bib abs
Probing the Need for Visual Context in Multimodal Machine Translation
Ozan Caglayan | Pranava Madhyastha | Lucia Specia | Loïc Barrault
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Current work on multimodal machine translation (MMT) has suggested that the visual modality is either unnecessary or only marginally beneficial. We posit that this is a consequence of the very simple, short and repetitive sentences used in the only available dataset for the task (Multi30K), rendering the source text sufficient as context. In the general case, however, we believe that it is possible to combine visual and textual information in order to ground translations. In this paper we probe the contribution of the visual modality to state-of-the-art MMT models by conducting a systematic analysis where we partially deprive the models from source-side textual context. Our results show that under limited textual context, models are capable of leveraging the visual input to generate better translations. This contradicts the current belief that MMT models disregard the visual modality because of either the quality of the image features or the way they are integrated into the model.

2018

pdf bib abs
What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties
Alexis Conneau | German Kruszewski | Guillaume Lample | Loïc Barrault | Marco Baroni
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. “Downstream” tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. We introduce here 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods.

We present the results from the third shared task on multimodal machine translation. In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech. The image can be used in addition to (or instead of) the source sentence. This year the task was extended with a third target language (Czech) and a new test set. In addition, a variant of this task was introduced with its own test set where the source sentence is given in multiple languages: English, French and German, and participating systems are required to generate a translation in Czech. Seven teams submitted 45 different systems to the two variants of the task. Compared to last year, the performance of the multimodal submissions improved, but text-only systems remain competitive.

This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previous multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English→French and second for English→German language pairs among the constrained submissions according to the automatic evaluation metric METEOR.

2017

pdf bib abs
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau | Douwe Kiela | Holger Schwenk | Loïc Barrault | Antoine Bordes
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks. Our encoder is publicly available.

pdf bib
Word Representations in Factored Neural Machine Translation
Franck Burlot | Mercedes García-Martínez | Loïc Barrault | Fethi Bougares | François Yvon
Proceedings of the Second Conference on Machine Translation

pdf bib
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
Desmond Elliott | Stella Frank | Loïc Barrault | Fethi Bougares | Lucia Specia
Proceedings of the Second Conference on Machine Translation

pdf bib abs
Very Deep Convolutional Networks for Text Classification
Alexis Conneau | Holger Schwenk | Loïc Barrault | Yann Lecun
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses only small convolutions and pooling operations. We are able to show that the performance of this model increases with the depth: using up to 29 convolutional layers, we report improvements over the state-of-the-art on several public text classification tasks. To the best of our knowledge, this is the first time that very deep convolutional nets have been applied to text processing.

2016

pdf bib
SHEF-LIUM-NN: Sentence level Quality Estimation with Neural Network Features
Kashif Shah | Fethi Bougares | Loïc Barrault | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib abs
Factored Neural Machine Translation Architectures
Mercedes García-Martínez | Loïc Barrault | Fethi Bougares
Proceedings of the 13th International Conference on Spoken Language Translation

In this paper we investigate the potential of the neural machine translation (NMT) when taking into consideration the linguistic aspect of target language. From this standpoint, the NMT approach with attention mechanism [1] is extended in order to produce several linguistically derived outputs. We train our model to simultaneously output the lemma and its corresponding factors (e.g. part-of-speech, gender, number). The word level translation is built with a mapping function using a priori linguistic information. Compared to the standard NMT system, factored architecture increases significantly the vocabulary coverage while decreasing the number of unknown words. With its richer architecture, the Factored NMT approach allows us to implement several training setup that will be discussed in detail along this paper. On the IWSLT’15 English-to-French task, FNMT model outperforms NMT model in terms of BLEU score. A qualitative analysis of the output on a set of test sentences shows the effectiveness of the FNMT model.

2015

pdf bib
Incremental Adaptation Strategies for Neural Network Language Models
Alex Ter-Sarkisov | Holger Schwenk | Fethi Bougares | Loïc Barrault
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

pdf bib
The LIUM ASR and SLT systems for IWSLT 2015
Mercedes Garcia Martínez | Loïc Barrault | Anthony Rousseau | Paul Deléglise | Yannick Estève
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Improving continuous space language models auxiliary features
Walid Aransa | Holger Schwenk | Loïc Barrault
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

pdf bib
Continuous Adaptation to User Feedback for Statistical Machine Translation
Frédéric Blain | Fethi Bougares | Amir Hazem | Loïc Barrault | Holger Schwenk
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Using Hypothesis Selection Based Features for Confusion Network MT System Combination
Sahar Ghannay | Loïc Barrault
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib abs
LIUM English-to-French spoken language translation system and the Vecsys/LIUM automatic speech recognition system for Italian language for IWSLT 2014
Anthony Rousseau | Loïc Barrault | Paul Deléglise | Yannick Estève | Holger Schwenk | Samir Bennacef | Armando Muscariello | Stephan Vanni
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the Spoken Language Translation system developed by the LIUM for the IWSLT 2014 evaluation campaign. We participated in two of the proposed tasks: (i) the Automatic Speech Recognition task (ASR) in two languages, Italian with the Vecsys company, and English alone, (ii) the English to French Spoken Language Translation task (SLT). We present the approaches and specificities found in our systems, as well as the results from the evaluation campaign.

2013

pdf bib
Issues in incremental adaptation of statistical MT from human post-edits
Mauro Cettolo | Christophe Servan | Nicola Bertoldi | Marcello Federico | Loïc Barrault | Holger Schwenk
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

pdf bib
Multimodal Comparable Corpora as Resources for Extracting Parallel Data: Parallel Phrases Extraction
Haithem Afli | Loïc Barrault | Holger Schwenk
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Proceedings of RECITAL 2013
Florian Boudin | Loïc Barrault
Proceedings of RECITAL 2013

2012

pdf bib abs
A General Framework to Weight Heterogeneous Parallel Data for Model Adaptation in Statistical MT
Kashif Shah | Loïc Barrault | Holger Schwenk
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

The standard procedure to train the translation model of a phrase-based SMT system is to concatenate all available parallel data, to perform word alignment, to extract phrase pairs and to calculate translation probabilities by simple relative frequency. However, parallel data is quite inhomogeneous in many practical applications with respect to several factors like data source, alignment quality, appropriateness to the task, etc. We propose a general framework to take into account these factors during the calculation of the phrase-table, e.g. by better distributing the probability mass of the individual phrase pairs. No additional feature functions are needed. We report results on two well-known tasks: the IWSLT’11 and WMT’11 evaluations, in both conditions translating from English to French. We give detailed results for different functions to weight the bitexts. Our best systems improve a strong baseline by up to one BLEU point without any impact on the computational complexity during training or decoding.

pdf bib
Traduction automatique à partir de corpus comparables: extraction de phrases parallèles à partir de données comparables multimodales (Automatic Translation from Comparable corpora : extracting parallel sentences from multimodal comparable corpora) [in French]
Haithem Afli | Loïc Barrault | Holger Schwenk
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
LIUM’s SMT Machine Translation Systems for WMT 2012
Christophe Servan | Patrik Lambert | Anthony Rousseau | Holger Schwenk | Loïc Barrault
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib abs
Semi-supervised transliteration mining from parallel and comparable corpora
Walid Aransa | Holger Schwenk | Loic Barrault
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

Transliteration is the process of writing a word (mainly proper noun) from one language in the alphabet of another language. This process requires mapping the pronunciation of the word from the source language to the closest possible pronunciation in the target language. In this paper we introduce a new semi-supervised transliteration mining method for parallel and comparable corpora. The method is mainly based on a new suggested Three Levels of Similarity (TLS) scores to extract the transliteration pairs. The first level calculates the similarity of of all vowel letters and consonants letters. The second level calculates the similarity of long vowels and vowel letters at beginning and end position of the words and consonants letters. The third level calculates the similarity consonants letters only. We applied our method on Arabic-English parallel and comparable corpora. We evaluated the extracted transliteration pairs using a statistical based transliteration system. This system is built using letters instead or words as tokens. The transliteration system achieves an accuracy of 0.50 and a mean F-score 0.8958 when trained on transliteration pairs extracted from a parallel corpus. The accuracy is 0.30 and the mean F-score 0.84 when we used instead a comparable corpus to automatically extract the transliteration pairs. This shows that the proposed semi-supervised transliteration mining algorithm is effective and can be applied to other language pairs. We also evaluated two segmentation techniques and reported the impact on the transliteration performance.

2011

pdf bib
Parametric Weighting of Parallel Data for Statistical Machine Translation
Kashif Shah | Loïc Barrault | Holger Schwenk
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
MANY improvements for WMT’11
Loïc Barrault
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib abs
LIUM’s statistical machine translation system for IWSLT 2010
Anthony Rousseau | Loïc Barrault | Paul Deléglise | Yannick Estève
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the two systems developed by the LIUM laboratory for the 2010 IWSLT evaluation campaign. We participated to the new English to French TALK task. We developed two systems, one for each evaluation condition, both being statistical phrase-based systems using the the Moses toolkit. Several approaches were investigated.

pdf bib
MANY: Open Source MT System Combination at WMT’10
Loïc Barrault
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Translation Model Adaptation by Resampling
Kashif Shah | Loïc Barrault | Holger Schwenk
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib
SMT and SPE Machine Translation Systems for WMT‘09
Holger Schwenk | Sadaf Abdul-Rauf | Loïc Barrault | Jean Senellart
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib abs
LIUM’s statistical machine translation system for IWSLT 2009
Holger Schwenk | Loïc Barrault | Yannick Estève | Patrik Lambert
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the systems developed by the LIUM laboratory for the 2009 IWSLT evaluation. We participated in the Arabic and Chinese to English BTEC tasks. We developed three different systems: a statistical phrase-based system using the Moses toolkit, an Statistical Post-Editing system and a hierarchical phrase-based system based on Joshua. A continuous space language model was deployed to improve the modeling of the target language. These systems are combined by a confusion network based approach.