Andrei Popescu-Belis

Also published as: A. Popescu-Belis, Andrei Popescu Belis

2024

pdf abs
Don’t Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation
Giorgos Vernikos | Andrei Popescu-Belis
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural machine translation systems estimate probabilities of target sentences given source sentences, yet these estimates may not align with human preferences. This work introduces QE-fusion, a method that synthesizes translations using a quality estimation metric (QE), which correlates better with human judgments. QE-fusion leverages a pool of candidates sampled from a model, combining spans from different candidates using a QE metric such as CometKiwi. We compare QE-fusion against beam search and recent reranking techniques, such as Minimum Bayes Risk decoding or QE-reranking. Our method consistently improves translation quality in terms of COMET and BLEURT scores when applied to large language models (LLMs) used for translation (PolyLM, XGLM, Llama2, Mistral, ALMA, and Tower) and to multilingual translation models (NLLB), over five language pairs. Notably, QE-fusion exhibits larger improvements for LLMs due to their ability to generate diverse outputs. We demonstrate that our approach generates novel translations in over half of the cases and consistently outperforms other methods across varying numbers of candidates (5–200). Furthermore, we empirically establish that QE-fusion scales linearly with the number of candidates in the pool.

2023

pdf abs
A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation
Àlex R. Atrio | Alexis Allemann | Ljiljana Dolamic | Andrei Popescu-Belis
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)

Training neural MT systems for low-resource language pairs or in unsupervised settings (i.e. with no parallel data) often involves a large number of auxiliary systems. These may include parent systems trained on higher-resource pairs and used for initializing the parameters of child systems, multilingual systems for neighboring languages, and several stages of systems trained on pseudo-parallel data obtained through back-translation. We propose here a simplified pipeline, which we compare to the best submissions to the WMT 2021 Shared Task on Unsupervised MT and Very Low Resource Supervised MT. Our pipeline only needs two parents, two children, one round of back-translation for low-resource directions and two for unsupervised ones and obtains better or similar scores when compared to more complex alternatives.

pdf bib abs
GPoeT: a Language Model Trained for Rhyme Generation on Synthetic Data
Andrei Popescu-Belis | Àlex R. Atrio | Bastien Bernath | Etienne Boisson | Teo Ferrari | Xavier Theimer-Lienhard | Giorgos Vernikos
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Poem generation with language models requires the modeling of rhyming patterns. We propose a novel solution for learning to rhyme, based on synthetic data generated with a rule-based rhyming algorithm. The algorithm and an evaluation metric use a phonetic dictionary and the definitions of perfect and assonant rhymes. We fine-tune a GPT-2 English model with 124M parameters on 142 MB of natural poems and find that this model generates consecutive rhymes infrequently (11%). We then fine-tune the model on 6 MB of synthetic quatrains with consecutive rhymes (AABB) and obtain nearly 60% of rhyming lines in samples generated by the model. Alternating rhymes (ABAB) are more difficult to model because of longer-range dependencies, but they are still learnable from synthetic data, reaching 45% of rhyming lines in generated samples.

pdf abs
Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMT
Benoist Wolleb | Romain Silvestri | Georgios Vernikos | Ljiljana Dolamic | Andrei Popescu-Belis
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

Subword tokenization is the de-facto standard for tokenization in neural language models and machine translation systems. Three advantages are frequently put forward in favor of subwords: shorter encoding of frequent tokens, compositionality of subwords, and ability to deal with unknown words. As their relative importance is not entirely clear yet, we propose a tokenization approach that enables us to separate frequency (the first advantage) from compositionality, thanks to the use of Huffman coding, which tokenizes words using a fixed amount of symbols. Experiments with CS-DE, EN-FR and EN-DE NMT show that frequency alone accounts for approximately 90% of the BLEU scores reached by BPE, hence compositionality has less importance than previously thought.

2022

pdf abs
On the Interaction of Regularization Factors in Low-resource Neural Machine Translation
Àlex R. Atrio | Andrei Popescu-Belis
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

We explore the roles and interactions of the hyper-parameters governing regularization, and propose a range of values applicable to low-resource neural machine translation. We demonstrate that default or recommended values for high-resource settings are not optimal for low-resource ones, and that more aggressive regularization is needed when resources are scarce, in proportion to their scarcity. We explain our observations by the generalization abilities of sharp vs. flat basins in the loss landscape of a neural network. Results for four regularization factors corroborate our claim: batch size, learning rate, dropout rate, and gradient clipping. Moreover, we show that optimal results are obtained when using several of these factors, and that our findings generalize across datasets of different sizes and languages.

This paper describes a system for interactive poem generation, which combines neural language models (LMs) for poem generation with explicit constraints that can be set by users on form, topic, emotion, and rhyming scheme. LMs cannot learn such constraints from the data, which is scarce with respect to their needs even for a well-resourced language such as French. We propose a method to generate verses and stanzas by combining LMs with rule-based algorithms, and compare several approaches for adjusting the words of a poem to a desired combination of topics or emotions. An approach to automatic rhyme setting using a phonetic dictionary is proposed as well. Our system has been demonstrated at public events, and log analysis shows that users found it engaging.

2021

pdf abs
Subword Mapping and Anchoring across Languages
Giorgos Vernikos | Andrei Popescu-Belis
Findings of the Association for Computational Linguistics: EMNLP 2021

State-of-the-art multilingual systems rely on shared vocabularies that sufficiently cover all considered languages. To this end, a simple and frequently used approach makes use of subword vocabularies constructed jointly over several languages. We hypothesize that such vocabularies are suboptimal due to false positives (identical subwords with different meanings across languages) and false negatives (different subwords with similar meanings). To address these issues, we propose Subword Mapping and Anchoring across Languages (SMALA), a method to construct bilingual subword vocabularies. SMALA extracts subword alignments using an unsupervised state-of-the-art mapping technique and uses them to create cross-lingual anchors based on subword similarities. We demonstrate the benefits of SMALA for cross-lingual natural language inference (XNLI), where it improves zero-shot transfer to an unseen language without task-specific data, but only by sharing subword embeddings. Moreover, in neural machine translation, we show that joint subword vocabularies obtained with SMALA lead to higher BLEU scores on sentences that contain many false positives and false negatives.

pdf abs
The IICT-Yverdon System for the WMT 2021 Unsupervised MT and Very Low Resource Supervised MT Task
Àlex R. Atrio | Gabriel Luthier | Axel Fahy | Giorgos Vernikos | Andrei Popescu-Belis | Ljiljana Dolamic
Proceedings of the Sixth Conference on Machine Translation

In this paper, we present the systems submitted by our team from the Institute of ICT (HEIG-VD / HES-SO) to the Unsupervised MT and Very Low Resource Supervised MT task. We first study the improvements brought to a baseline system by techniques such as back-translation and initialization from a parent model. We find that both techniques are beneficial and suffice to reach performance that compares with more sophisticated systems from the 2020 task. We then present the application of this system to the 2021 task for low-resource supervised Upper Sorbian (HSB) to German translation, in both directions. Finally, we present a contrastive system for HSB-DE in both directions, and for unsupervised German to Lower Sorbian (DSB) translation, which uses multi-task training with various training schedules to improve over the baseline.

pdf abs
Small Batch Sizes Improve Training of Low-Resource Neural MT
Àlex Atrio | Andrei Popescu-Belis
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

We study the role of an essential hyper-parameter that governs the training of Transformers for neural machine translation in a low-resource setting: the batch size. Using theoretical insights and experimental evidence, we argue against the widespread belief that batch size should be set as large as allowed by the memory of the GPUs. We show that in a low-resource setting, a smaller batch size leads to higher scores in a shorter training time, and argue that this is due to better regularization of the gradients during training.

2020

pdf abs
Chat or Learn: a Data-Driven Robust Question-Answering System
Gabriel Luthier | Andrei Popescu-Belis
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a voice-based conversational agent which combines the robustness of chatbots and the utility of question answering (QA) systems. Indeed, while data-driven chatbots are typically user-friendly but not goal-oriented, QA systems tend to perform poorly at chitchat. The proposed chatbot relies on a controller which performs dialogue act classification and feeds user input either to a sequence-to-sequence chatbot or to a QA system. The resulting chatbot is a spoken QA application for the Google Home smart speaker. The system is endowed with general-domain knowledge from Wikipedia articles and uses coreference resolution to detect relatedness between questions. We present our choices of data sets for training and testing the components, and present the experimental results that helped us optimize the parameters of the chatbot. In particular, we discuss the appropriateness of using the SQuAD dataset for evaluating end-to-end QA, in the light of our system’s behavior.

pdf abs
A Consolidated Dataset for Knowledge-based Question Generation using Predicate Mapping of Linked Data
Johanna Melly | Gabriel Luthier | Andrei Popescu-Belis
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

In this paper, we present the ForwardQuestions data set, made of human-generated questions related to knowledge triples. This data set results from the conversion and merger of the existing SimpleDBPediaQA and SimpleQuestionsWikidata data sets, including the mapping of predicates from DBPedia to Wikidata, and the selection of ‘forward’ questions as opposed to ‘backward’ ones. The new data set can be used to generate novel questions given an unseen Wikidata triple, by replacing the subjects of existing questions with the new one and then selecting the best candidate questions using semantic and syntactic criteria. Evaluation results indicate that the question generation method using ForwardQuestions improves the quality of questions by about 20% with respect to a baseline not using ranking criteria.

2019

pdf bib
Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)
Andrei Popescu-Belis | Sharid Loáiciga | Christian Hardmeier | Deyi Xiong
Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)

2018

pdf abs
Self-Attentive Residual Decoder for Neural Machine Translation
Lesly Miculicich Werlen | Nikolaos Pappas | Dhananjay Ram | Andrei Popescu-Belis
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation. One of the reasons for their effectiveness is their ability to capture relevant source-side contextual information at each time-step prediction through an attention mechanism. However, the target-side context is solely based on the sequence model which, in practice, is prone to a recency bias and lacks the ability to capture effectively non-sequential dependencies among words. To address this limitation, we propose a target-side-attentive residual recurrent network for decoding, where attention over previous words contributes directly to the prediction of the next word. The residual learning facilitates the flow of information from the distant past and is able to emphasize any of the previously translated words, hence it gains access to a wider context. The proposed model outperforms a neural MT baseline as well as a memory and self-attention network on three language pairs. The analysis of the attention learned by the decoder confirms that it emphasizes a wider context, and that it captures syntactic-like structures.

pdf abs
Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation
Xiao Pu | Nikolaos Pappas | James Henderson | Andrei Popescu-Belis
Transactions of the Association for Computational Linguistics, Volume 6

This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are more than 1 BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.

pdf
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
Pierre-Edouard Honnet | Andrei Popescu-Belis | Claudiu Musat | Michael Baeriswyl
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues
Xiao Pu | Laura Mascarell | Andrei Popescu-Belis
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We propose a method to decide whether two occurrences of the same noun in a source text should be translated consistently, i.e. using the same noun in the target text as well. We train and test classifiers that predict consistent translations based on lexical, syntactic, and semantic features. We first evaluate the accuracy of our classifiers intrinsically, in terms of the accuracy of consistency predictions, over a subset of the UN Corpus. Then, we also evaluate them in combination with phrase-based statistical MT systems for Chinese-to-English and German-to-English. We compare the automatic post-editing of noun translations with the re-ranking of the translation hypotheses based on the classifiers’ output, and also use these methods in combination. This improves over the baseline and closes up to 50% of the gap in BLEU scores between the baseline and an oracle classifier.

pdf abs
Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities
Ngoc Quang Luong | Andrei Popescu-Belis | Annette Rios Gonzales | Don Tuggener
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We implement a fully probabilistic model to combine the hypotheses of a Spanish anaphora resolution system with those of a Spanish-English machine translation system. The probabilities over antecedents are converted into probabilities for the features of translated pronouns, and are integrated with phrase-based MT using an additional translation model for pronouns. The system improves the translation of several Spanish personal and possessive pronouns into English, by solving translation divergencies such as ‘ella’ vs. ‘she’/‘it’ or ‘su’ vs. ‘his’/‘her’/‘its’/‘their’. On a test set with 2,286 pronouns, a baseline system correctly translates 1,055 of them, while ours improves this by 41. Moreover, with oracle antecedents, possessives are translated with an accuracy of 83%.

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

pdf abs
Using Coreference Links to Improve Spanish-to-English Machine Translation
Lesly Miculicich Werlen | Andrei Popescu-Belis
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

In this paper, we present a proof-of-concept implementation of a coreference-aware decoder for document-level machine translation. We consider that better translations should have coreference links that are closer to those in the source text, and implement this criterion in two ways. First, we define a similarity measure between source and target coreference structures, by projecting the target ones onto the source and reusing existing coreference metrics. Based on this similarity measure, we re-rank the translation hypotheses of a baseline system for each sentence. Alternatively, to address the lack of diversity of mentions in the MT hypotheses, we focus on mention pairs and integrate their coreference scores with MT ones, resulting in post-editing decisions for mentions. The experimental results for Spanish to English MT on the AnCora-ES corpus show that the second approach yields a substantial increase in the accuracy of pronoun translation, with BLEU scores remaining constant.

pdf bib
Sense-Aware Statistical Machine Translation using Adaptive Context-Dependent Clustering
Xiao Pu | Nikolaos Pappas | Andrei Popescu-Belis
Proceedings of the Second Conference on Machine Translation

pdf bib
Proceedings of the Third Workshop on Discourse in Machine Translation
Bonnie Webber | Andrei Popescu-Belis | Jörg Tiedemann
Proceedings of the Third Workshop on Discourse in Machine Translation

pdf bib abs
Validation of an Automatic Metric for the Accuracy of Pronoun Translation (APT)
Lesly Miculicich Werlen | Andrei Popescu-Belis
Proceedings of the Third Workshop on Discourse in Machine Translation

In this paper, we define and assess a reference-based metric to evaluate the accuracy of pronoun translation (APT). The metric automatically aligns a candidate and a reference translation using GIZA++ augmented with specific heuristics, and then counts the number of identical or different pronouns, with provision for legitimate variations and omitted pronouns. All counts are then combined into one score. The metric is applied to the results of seven systems (including the baseline) that participated in the DiscoMT 2015 shared task on pronoun translation from English to French. The APT metric reaches around 0.993-0.999 Pearson correlation with human judges (depending on the parameters of APT), while other automatic metrics such as BLEU, METEOR, or those specific to pronouns used at DiscoMT 2015 reach only 0.972-0.986 Pearson correlation.

pdf abs
Multilingual Hierarchical Attention Networks for Document Classification
Nikolaos Pappas | Andrei Popescu-Belis
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. However, when multilingual document collections are considered, training such models separately for each language entails linear parameter growth and lack of cross-language transfer. Learning a single multilingual model with fewer parameters is therefore a challenging but potentially beneficial objective. To this end, we propose multilingual hierarchical attention networks for learning document structures, with shared encoders and/or shared attention mechanisms across languages, using multi-task learning and an aligned semantic space as input. We evaluate the proposed models on multilingual document classification with disjoint label sets, on a large dataset which we provide, with 600k news documents in 8 languages, and 5k labels. The multilingual models outperform monolingual ones in low-resource as well as full-resource settings, and use fewer parameters, thus confirming their computational efficiency and the utility of cross-language transfer.

2016

pdf bib
Improving Pronoun Translation by Modeling Coreference Uncertainty
Ngoc Quang Luong | Andrei Popescu-Belis
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf
Pronoun Language Model and Grammatical Heuristics for Aiding Pronoun Prediction
Ngoc Quang Luong | Andrei Popescu-Belis
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
A Contextual Language Model to Improve Machine Translation of Pronouns by Re-ranking Translation Hypotheses
Ngoc Quang Luong | Andrei Popescu-Belis
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf
Human versus Machine Attention in Document Classification: A Dataset with Crowdsourced Annotations
Nikolaos Pappas | Andrei Popescu-Belis
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media

pdf abs
Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
Jeevanthi Liyanapathirana | Andrei Popescu-Belis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a solution to evaluate spoken post-editing of imperfect machine translation output by a human translator. We compare two approaches to the combination of machine translation (MT) and automatic speech recognition (ASR): a heuristic algorithm and a machine learning method. To obtain a data set with spoken post-editing information, we use the French version of TED talks as the source texts submitted to MT, and the spoken English counterparts as their corrections, which are submitted to an ASR system. We experiment with various levels of artificial ASR noise and also with a state-of-the-art ASR system. The results show that the combination of MT with ASR improves over both individual outputs of MT and ASR in terms of BLEU scores, especially when ASR performance is low.

2015

pdf bib
Proceedings of the Second Workshop on Discourse in Machine Translation
Bonnie Webber | Marine Carpuat | Andrei Popescu-Belis | Christian Hardmeier
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf
Pronoun Translation and Prediction with or without Coreference Links
Ngoc Quang Luong | Lesly Miculicich Werlen | Andrei Popescu-Belis
Proceedings of the Second Workshop on Discourse in Machine Translation

2014

pdf abs
English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling
Sharid Loáiciga | Thomas Meyer | Andrei Popescu-Belis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a method for verb phrase (VP) alignment in an English-French parallel corpus and its use for improving statistical machine translation (SMT) of verb tenses. The method starts from automatic word alignment performed with GIZA++, and relies on a POS tagger and a parser, in combination with several heuristics, in order to identify non-contiguous components of VPs, and to label the aligned VPs with their tense and voice on each side. This procedure is applied to the Europarl corpus, leading to the creation of a smaller, high-precision parallel corpus with about 320,000 pairs of finite VPs, which is made publicly available. This resource is used to train a tense predictor for translation from English into French, based on a large number of surface features. Three MT systems are compared: (1) a baseline phrase-based SMT; (2) a tense-aware SMT system using the above predictions within a factored translation model; and (3) a system using oracle predictions from the aligned VPs. For several tenses, such as the French “imparfait”, the tense-aware SMT system improves significantly over the baseline and is closer to the oracle system.

pdf
Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis
Nikolaos Pappas | Andrei Popescu-Belis
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Enforcing Topic Diversity in a Document Recommender for Conversations
Maryam Habibi | Andrei Popescu-Belis
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Proceedings of the Workshop on Discourse in Machine Translation
Bonnie Webber | Andrei Popescu-Belis | Katja Markert | Jörg Tiedemann
Proceedings of the Workshop on Discourse in Machine Translation

pdf
Detecting Narrativity to Improve English to French Translation of Simple Past Verbs
Thomas Meyer | Cristina Grisot | Andrei Popescu-Belis
Proceedings of the Workshop on Discourse in Machine Translation

pdf
Diverse Keyword Extraction from Conversations
Maryam Habibi | Andrei Popescu-Belis
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf abs
Machine Translation of Labeled Discourse Connectives
Thomas Meyer | Andrei Popescu-Belis | Najeh Hajlaoui | Andrea Gesmundo
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper shows how the disambiguation of discourse connectives can improve their automatic translation, while preserving the overall performance of statistical MT as measured by BLEU. State-of-the-art automatic classifiers for rhetorical relations are used prior to MT to label discourse connectives that signal those relations. These labels are used for MT in two ways: (1) by augmenting factored translation models; and (2) by using the probability distributions of labels in order to train and tune SMT. The improvement of translation quality is demonstrated using a new semi-automated metric for discourse connectives, on the English/French WMT10 data, while BLEU scores remain comparable to non-discourse-aware systems, due to the low frequency of discourse connectives.

pdf bib abs
Translating English Discourse Connectives into Arabic: a Corpus-based Analysis and an Evaluation Metric
Najeh Hajlaoui | Andrei Popescu-Belis
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages

Discourse connectives can often signal multiple discourse relations, depending on their context. The automatic identification of the Arabic translations of seven English discourse connectives shows how these connectives are differently translated depending on their actual senses. Automatic labelling of English source connectives can help a machine translation system to translate them more correctly. The corpus-based analysis of Arabic translations also enables the definition of a connective-specific evaluation metric for machine translation, which is here validated by human judges on sample English/Arabic translation data.

pdf
Using Sense-labeled Discourse Connectives for Statistical Machine Translation
Thomas Meyer | Andrei Popescu-Belis
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf abs
Discourse-level Annotation over Europarl for Machine Translation: Connectives and Pronouns
Andrei Popescu-Belis | Thomas Meyer | Jeevanthi Liyanapathirana | Bruno Cartoni | Sandrine Zufferey
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.

This paper summarizes the latest, final version of ISO standard 24617-2 ``Semantic annotation framework, Part 2: Dialogue acts"""". Compared to the preliminary version ISO DIS 24617-2:2010, described in Bunt et al. (2010), the final version additionally includes concepts for annotating rhetorical relations between dialogue units, defines a full-blown compositional semantics for the Dialogue Act Markup Language DiAML (resulting, as a side-effect, in a different treatment of functional dependence relations among dialogue acts and feedback dependence relations); and specifies an optimally transparent XML-based reference format for the representation of DiAML annotations, based on the systematic application of the notion of `ideal concrete syntax'. We describe these differences and briefly discuss the design and implementation of an incremental method for dialogue act recognition, which proves the usability of the ISO standard for automatic dialogue annotation.

2011

pdf
Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering
Majid Yazdani | Andrei Popescu-Belis
Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing

pdf
How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives
Bruno Cartoni | Sandrine Zufferey | Thomas Meyer | Andrei Popescu-Belis
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

pdf
Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation
Thomas Meyer | Andrei Popescu-Belis | Sandrine Zufferey | Bruno Cartoni
Proceedings of the SIGDIAL 2011 Conference

pdf
A Just-in-Time Document Retrieval System for Dialogues or Monologues
Andrei Popescu-Belis | Majid Yazdani | Alexandre Nanchen | Philip N. Garner
Proceedings of the SIGDIAL 2011 Conference

pdf
A Speech-based Just-in-Time Retrieval System using Semantic Search
Andrei Popescu-Belis | Majid Yazdani | Alexandre Nanchen | Philip N. Garner
Proceedings of the ACL-HLT 2011 System Demonstrations

2010

This paper describes an ISO project which aims at developing a standard for annotating spoken and multimodal dialogue with semantic information concerning the communicative functions of utterances, the kind of semantic content they address, and their relations with what was said and done earlier in the dialogue. The project, ISO 24617-2 ""Semantic annotation framework, Part 2: Dialogue acts"", is currently at DIS stage. The proposed annotation schema distinguishes 9 orthogonal dimensions, allowing each functional segment in dialogue to have a function in each of these dimensions, thus accounting for the multifunctionality that utterances in dialogue often have. A number of core communicative functions is defined in the form of ISO data categories, available at http://semantic-annotation.uvt.nl/dialogue-acts/iso-datcats.pdf; they are divided into ""dimension-specific"" functions, which can be used only in a particular dimension, such as Turn Accept in the Turn Management dimension, and ""general-purpose"" functions, which can be used in any dimension, such as Inform and Request. An XML-based annotation language, ""DiAML"" is defined, with an abstract syntax, a semantics, and a concrete syntax.

2008

pdf abs
Task-Based Evaluation of Meeting Browsers: from Task Elicitation to User Behavior Analysis
Andrei Popescu-Belis | Mike Flynn | Pierre Wellner | Philippe Baudrion
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents recent results of the application of the task-based Browser Evaluation Test (BET) to meeting browsers, that is, interfaces to multimodal databases of meeting recordings. The tasks were defined by browser-neutral BET observers. Two groups of human subjects used the Transcript-based Query and Browsing interface (TQB), and attempted to solve as many BET tasks - pairs of true/false statements to disambiguate - as possible in a fixed amount of time. Their performance was measured in terms of precision and speed. Results indicate that the browsers annotation-based search functionality is frequently used, in particular the keyword search. A more detailed analysis of each test question for each participant confirms that despite considerable variation across strategies, the use of queries is correlated to successful performance.

pdf abs
Improving Contextual Quality Models for MT Evaluation Based on Evaluators’ Feedback
Paula Estrella | Andrei Popescu-Belis | Maghi King
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The Framework for the Evaluation for Machine Translation (FEMTI) contains guidelines for building a quality model that is used to evaluate MT systems in relation to the purpose and intended context of use of the systems. Contextual quality models can thus be constructed, but entering into FEMTI the knowledge required for this operation is a complex task. An experiment has been set up in order to transfer knowledge from MT evaluation experts into the FEMTI guidelines, by polling experts about the evaluation methods they would use in a particular context, then inferring from the results generic relations between characteristics of the context of use and quality characteristics. The results of this hands-on exercise, carried out as part of a conference tutorial, have served to refine FEMTIs generic contextual quality model and to obtain feedback on the FEMTI guidelines in general.

2007

pdf
A new method for the study of correlations between MT evaluation metrics
Paula Estrella | Andrei Popescu-Belis | Maghi King
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
Le rôle des métriques d’évaluation dans le processus de recherche en TAL [The role of evaluation metrics in the NLP research process]
Andrei Popescu-Belis
Traitement Automatique des Langues, Volume 48, Numéro 1 : Principes de l'évaluation en Traitement Automatique des Langues [Principles of Evaluation in Natural Language Processing]

pdf
Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus
Andrei Popescu-Belis | Paula Estrella
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf
How much data is needed for reliable MT evaluation? Using bootstrapping to study human and automatic metrics
Paula Estrella | Olivier Hamon | Andrei Popescu-Belis
Proceedings of Machine Translation Summit XI: Papers

pdf
Assessing human and automated quality judgments in the French MT evaluation campaign CESTA
Olivier Hamon | Anthony Hartley | Andrei Popescu-Belis | Khalid Choukri
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Context-based evaluation of MT systems: principles and tools
Maghi King | Andrei Popescu-Belis | Paula Estrella
Proceedings of Machine Translation Summit XI: Tutorials

pdf
Evaluation of NLG: some analogies and differences with machine translation and reference resolution
Andrei Popescu-Belis
Proceedings of the Workshop on Using corpora for natural language generation

bib
The place of automatic evaluation metrics in external quality models for machine translation
Andrei Popescu-Belis
Proceedings of the Workshop on Automatic procedures in MT evaluation

pdf
Contrasting the Automatic Identification of Two Discourse Markers in Multiparty Dialogues
Andrei Popescu-Belis | Sandrine Zufferey
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2006

pdf abs
A Model for Context-Based Evaluation of Language Processing Systems and its Application to Machine Translation Evaluation
Andrei Popescu-Belis | Paula Estrella | Margaret King | Nancy Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we propose a formal framework that takes into account the influence of the intended context of use of an NLP system on the procedure and the metrics used to evaluate the system. We introduce in particular the notion of a context-dependent quality model and explain how it can be adapted to a given context of use. More specifically, we define vector-space representations of contexts of use and of quality models, which are connected by a generic contextual quality model (GCQM). For each domain, experts in evaluation are needed to build a GCQM based on analytic knowledge and on previous evaluations, using the mechanism proposed here. The main inspiration source for this work is the FEMTI framework for the evaluation of machine translation, which implements partly the present model, and which is described briefly along with insights from other domains.

pdf abs
TQB: Accessing Multimodal Data Using a Transcript-based Query and Browsing Interface
Andrei Popescu-Belis | Maria Georgescul
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This article describes an interface for searching and browsing multimodal recordings of group meetings. We provide first an overall perspective of meeting processing and retrieval applications, and distinguish between the media/modalities that are recorded and the ones that are used for browsing. We then proceed to describe the data and the annotations that are stored in a meeting database. Two scenarios of use for the transcript-based query and browsing interface (TQB) are then outlined: search and browse vs. overview and browse. The main functionalities of TQB, namely the database backend and the multimedia rendering solutions are described. An outline of evaluation perspectives is finally provided, with a description of the user interaction features that will be monitored.

This article outlines the evaluation protocol and provides the main results of the French Evaluation Campaign for Machine Translation Systems, CESTA. Following the initial objectives and evaluation plans, the evaluation metrics are briefly described: along with fluency and adequacy assessed by human judges, a number of recently proposed automated metrics are used. Two evaluation campaigns were organized, the first one in the general domain, and the second one in the medical domain. Up to six systems translating from English into French, and two systems translating from Arabic into French, took part in the campaign. The numerical results illustrate the differences between classes of systems, and provide interesting indications about the reliability of the automated metrics for French as a target language, both by comparison to human judges and using correlations between metrics. The corpora that were produced, as well as the information about the reliability of metrics, constitute reusable resources for MT evaluation.

pdf abs
Résolution des références aux documents dans un corpus de dialogues humains
Andrei Popescu-Belis
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article étudie la résolution des références à des entités lorsqu’une représentation informatique de ces entités est disponible. Nous nous intéressons à un corpus de dialogues entre humains, portant sur les grands titres de la presse francophone du jour, et proposons une méthode pour détecter et résoudre les références faites par les locuteurs aux articles des journaux. La détection des expressions nominales qui réfèrent à ces documents est réalisée grâce à une grammaire, alors que le problème de la détection des pronoms qui réfèrent aux documents est abordé par des moyens statistiques. La résolution de ces expressions, à savoir l’attribution des référents, fait quant à elle l’objet d’un algorithme inspiré de la résolution des coréférences. Ces propositions sont évaluées par le biais de mesures quantitatives spécifiques.

2005

pdf bib
Finding the System that Suits You Best: Towards the Normalization of MT Evaluation
Paula Estrella | Andrei Popescu-Belis | Nancy Underwood
Translating and the Computer 27

In this paper, we report on the results of a full-size evaluation campaign of various MT systems. This campaign is novel compared to the classical DARPA/NIST MT evaluation campaigns in the sense that French is the target language, and that it includes an experiment of meta-evaluation of various metrics claiming to better predict different attributes of translation quality. We first describe the campaign, its context, its protocol and the data we used. Then we summarise the results obtained by the participating systems and discuss the meta-evaluation of the metrics used.

2004

pdf
Reference Resolution over a Restricted Domain: References to Documents
Andrei Popescu-Belis | Denis Lalanne
Proceedings of the Conference on Reference Resolution and Its Applications

pdf bib
CESTA: Machine Translation Evaluation Campaign [Work-in-Progress Project Report]
Widad Mustafa El Hadi | Marianne Dabbadie | Ismaïl Timimi | Martin Rajman | Philippe Langlais | Antony Hartley | Andrei Popescu Belis
Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training

pdf
Towards Automatic Identification of Discourse Markers in Dialogs: The Case of Like
Sandrine Zufferey | Andrei Popescu-Belis
Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004

pdf
Multi-level Dialogue Act Tags
Alexander Clark | Andrei Popescu-Belis
Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004

pdf
User Query Analysis for the Specification and Evaluation of a Dialogue Processing and Retrieval System
Agnes Lisowska | Andrei Popescu-Belis | Susan Armstrong
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Abstracting a Dialog Act Tagset for Meeting Processing
Andrei Popescu-Belis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Online Evaluation of Coreference Resolution
Andrei Popescu-Belis | Loïs Rigouste | Susanne Salmon-Alt | Laurent Romary
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Building and Using a Corpus of Shallow Dialogue Annotated Meetings
Andrei Popescu-Belis | Maria Georgescul | Alexander Clark | Susan Armstrong
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf abs
FEMTI: creating and using a framework for MT evaluation
Margaret King | Andrei Popescu-Belis | Eduard Hovy
Proceedings of Machine Translation Summit IX: Papers

This paper presents FEMTI, a web-based Framework for the Evaluation of Machine Translation in ISLE. FEMTI offers structured descriptions of potential user needs, linked to an overview of technical characteristics of MT systems. The description of possible systems is mainly articulated around the quality characteristics for software product set out in ISO/IEC standard 9126. Following the philosophy set out there and in the related 14598 series of standards, each quality characteristic bottoms out in metrics which may be applied to a particular instance of a system in order to judge how satisfactory the system is with respect to that characteristic. An evaluator can use the description of user needs to help identify the specific needs of his evaluation and the relations between them. He can then follow the pointers to system description to determine what metrics should be applied and how. In the current state of the framework, emphasis is on being exhaustive, including as much as possible of the information available in the literature on machine translation evaluation. Future work will aim at being more analytic, looking at characteristics and metrics to see how they relate to one another, validating metrics and investigating the correlation between particular metrics and human judgement.

pdf abs
An experiment in comparative evaluation: humans vs. computers
Andrei Popescu-Belis
Proceedings of Machine Translation Summit IX: Papers

This paper reports results from an experiment that was aimed at comparing evaluation metrics for machine translation. Implemented as a workshop at a major conference in 2002, the experiment defined an evaluation task, description of the metrics, as well as test data consisting of human and machine translations of two texts. Several metrics, either applicable by human judges or automated, were used, and the overall results were analyzed. It appeared that most human metrics and automated metrics provided in general consistent rankings of the various candidate translations; the ranking of the human translations matched the one provided by translation professionals; and human translations were distinguished from machine translations.