Fabienne Cap


2018

pdf
An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
Gongbo Tang | Fabienne Cap | Eva Pettersson | Joakim Nivre
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The NMT models are at different levels, have different attention mechanisms, and different neural network architectures. Our results show that NMT models are much better than SMT models in terms of character error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization. Transformer models perform better only when provided with more training data. We also find that subword-level models with a small subword vocabulary are better than character-level models. In addition, we propose a hybrid method which further improves the performance of historical spelling normalization.

2017

pdf
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.

pdf
Show Me Your Variance and I Tell You Who You Are - Deriving Compound Compositionality from Word Alignments
Fabienne Cap
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

We use word alignment variance as an indicator for the non-compositionality of German and English noun compounds. Our work-in-progress results are on their own not competitive with state-of-the art approaches, but they show that alignment variance is correlated with compositionality and thus worth a closer look in the future.

pdf
A BiLSTM-based System for Cross-lingual Pronoun Prediction
Sara Stymne | Sharid Loáiciga | Fabienne Cap
Proceedings of the Third Workshop on Discourse in Machine Translation

We describe the Uppsala system for the 2017 DiscoMT shared task on cross-lingual pronoun prediction. The system is based on a lower layer of BiLSTMs reading the source and target sentences respectively. Classification is based on the BiLSTM representation of the source and target positions for the pronouns. In addition we enrich our system with dependency representations from an external parser and character representations of the source sentence. We show that these additions perform well for German and Spanish as source languages. Our system is competitive and is in first or second place for all language pairs.

pdf
Did you ever read about Frogs drinking Coffee? Investigating the Compositionality of Multi-Emoji Expressions
Rebeca Padilla López | Fabienne Cap
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In this work, we present a first attempt to investigate multi-emoji expressions and whether they behave similarly to multiword expressions in terms of non-compositionality. We focus on the combination of the frog and the hot beverage emoji, but also show some preliminary results for other non-compositional emoji combinations. We use off-the-shelf sentiment analysers as well as manual classifications to approach the compositionality of these emoji combinations.

2016

pdf
Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools
Jörg Tiedemann | Fabienne Cap | Jenna Kanerva | Filip Ginter | Sara Stymne | Robert Östling | Marion Weller-Di Marco
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf
A Pilot Experiment on Exploiting Translations for Literary Studies on Kafka’s “Verwandlung”
Fabienne Cap | Ina Rösiger | Jonas Kuhn
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf
How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation
Fabienne Cap | Manju Nirmal | Marion Weller | Sabine Schulte im Walde
Proceedings of the 11th Workshop on Multiword Expressions

pdf
CimS - The CIS and IMS Joint Submission to WMT 2015 addressing morphological and syntactic differences in English to German SMT
Fabienne Cap | Marion Weller | Anita Ramm | Alexander Fraser
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf
CimS – The CIS and IMS joint submission to WMT 2014 translating from English into German
Fabienne Cap | Marion Weller | Anita Ramm | Alexander Fraser
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf
Large-scale Exact Decoding: The IMS-TTT submission to WMT14
Daniel Quernheim | Fabienne Cap
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf
Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation
Marion Weller | Fabienne Cap | Stefan Müller | Sabine Schulte im Walde | Alexander Fraser
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)

pdf
How to Produce Unseen Teddy Bears: Improved Morphological Processing of Compounds in SMT
Fabienne Cap | Alexander Fraser | Marion Weller | Aoife Cahill
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf
Using a rich feature set for the identification of German MWEs
Fabienne Cap | Marion Weller | Ulrich Heid
Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technologies

2012

pdf
Modeling Inflection and Word-Formation in SMT
Alexander Fraser | Marion Weller | Aoife Cahill | Fabienne Cap
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics