Silvio Cordeiro
Also published as: Silvio Ricardo Cordeiro
2019
Unsupervised Compositionality Prediction of Nominal Compounds
Silvio Cordeiro | Aline Villavicencio | Marco Idiart | Carlos Ramisch
Computational Linguistics, Volume 45, Issue 1 - March 2019
Silvio Cordeiro | Aline Villavicencio | Marco Idiart | Carlos Ramisch
Computational Linguistics, Volume 45, Issue 1 - March 2019
Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results.
Without lexicons, multiword expression identification will never fly: A position statement
Agata Savary | Silvio Cordeiro | Carlos Ramisch
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Agata Savary | Silvio Cordeiro | Carlos Ramisch
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Because most multiword expressions (MWEs), especially verbal ones, are semantically non-compositional, their automatic identification in running text is a prerequisite for semantically-oriented downstream applications. However, recent developments, driven notably by the PARSEME shared task on automatic identification of verbal MWEs, show that this task is harder than related tasks, despite recent contributions both in multilingual corpus annotation and in computational models. In this paper, we analyse possible reasons for this state of affairs. They lie in the nature of the MWE phenomenon, as well as in its distributional properties. We also offer a comparative analysis of the state-of-the-art systems, which exhibit particularly strong sensitivity to unseen data. On this basis, we claim that, in order to make strong headway in MWE identification, the community should bend its mind into coupling identification of MWEs with their discovery, via syntactic MWE lexicons. Such lexicons need not necessarily achieve a linguistically complete modelling of MWEs’ behavior, but they should provide minimal morphosyntactic information to cover some potential uses, so as to complement existing MWE-annotated corpora. We define requirements for such minimal NLP-oriented lexicon, and we propose a roadmap for the MWE community driven by these requirements.
Syntax-based identification of light-verb constructions
Silvio Ricardo Cordeiro | Marie Candito
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Silvio Ricardo Cordeiro | Marie Candito
Proceedings of the 22nd Nordic Conference on Computational Linguistics
This paper analyzes results on light-verb construction identification from the PARSEME shared-task, distinguishing between simple cases that could be directly learned from training data from more complex cases that require an extra level of semantic processing. We propose a simple baseline that beats the state of the art for the simple cases, and couple it with another simple baseline to handle the complex cases. We additionally present two other classifiers based on a richer set of features, with results surpassing the state of the art by 8 percentage points.
2018
Advances in Multiword Expression Identification for the Italian language: The PARSEME Shared Task Edition 1.1
Johanna Monti | Silvio Ricardo Cordeiro | Carlos Ramisch | Federico Sangati | Agata Savary | Veronika Vincze
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
Johanna Monti | Silvio Ricardo Cordeiro | Carlos Ramisch | Federico Sangati | Agata Savary | Veronika Vincze
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Silvio Ricardo Cordeiro | Shereen Oraby | Umashanthi Pavalanathan | Kyeongmin Rim
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Silvio Ricardo Cordeiro | Shereen Oraby | Umashanthi Pavalanathan | Kyeongmin Rim
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.
2017
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds
Rodrigo Wilkens | Leonardo Zilio | Silvio Ricardo Cordeiro | Felipe Paula | Carlos Ramisch | Marco Idiart | Aline Villavicencio
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers
Rodrigo Wilkens | Leonardo Zilio | Silvio Ricardo Cordeiro | Felipe Paula | Carlos Ramisch | Marco Idiart | Aline Villavicencio
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers
Literal readings of multiword expressions: as scarce as hen’s teeth
Agata Savary | Silvio Ricardo Cordeiro
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
Agata Savary | Silvio Ricardo Cordeiro
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
2016
mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
Silvio Cordeiro | Carlos Ramisch | Aline Villavicencio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Silvio Cordeiro | Carlos Ramisch | Aline Villavicencio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents mwetoolkit+sem: an extension of the mwetoolkit that estimates semantic compositionality scores for multiword expressions (MWEs) based on word embeddings. First, we describe our implementation of vector-space operations working on distributional vectors. The compositionality score is based on the cosine distance between the MWE vector and the composition of the vectors of its member words. Our generic system can handle several types of word embeddings and MWE lists, and may combine individual word representations using several composition techniques. We evaluate our implementation on a dataset of 1042 English noun compounds, comparing different configurations of the underlying word embeddings and word-composition models. We show that our vector-based scores model non-compositionality better than standard association measures such as log-likelihood.
Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time
Silvio Cordeiro | Carlos Ramisch | Marco Idiart | Aline Villavicencio
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Silvio Cordeiro | Carlos Ramisch | Marco Idiart | Aline Villavicencio
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality
Carlos Ramisch | Silvio Cordeiro | Leonardo Zilio | Marco Idiart | Aline Villavicencio
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Carlos Ramisch | Silvio Cordeiro | Leonardo Zilio | Marco Idiart | Aline Villavicencio
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Search
Fix author
Co-authors
- Carlos Ramisch 11
- Aline Villavicencio 7
- Agata Savary 5
- Marco Idiart 4
- Marie Candito 3
- Veronika Vincze 3
- Voula Giouli 2
- Johanna Monti 2
- Behrang QasemiZadeh 2
- Federico Sangati 2
- Ivelina Stoyanova 2
- Leonardo Zilio 2
- Verginica Barbu Mititelu 1
- Archna Bhatia 1
- Maja Buljan 1
- Fabienne Cap 1
- Antoine Doucet 1
- Polona Gantar 1
- Tunga Gungor 1
- Abdelati Hawwari 1
- Uxoa Iñurrieta 1
- Jolanta Kovalevskaitė 1
- Simon Krek 1
- Timm Lichte 1
- Chaya Liebeskind 1
- Shereen Oraby 1
- Carla Parra Escartín 1
- Felipe Paula 1
- Umashanthi Pavalanathan 1
- Renata Ramisch 1
- Kyeongmin Rim 1
- Nathan Schneider 1
- Ashwini Vaidya 1
- Abigail Walsh 1
- Rodrigo Wilkens 1