Eleni Metheniti

2023

pdf abs
“Chère maison” or “maison chère”? Transformer-based prediction of adjective placement in French
Eleni Metheniti | Tim Van De Cruys | Wissam Kerkri | Juliette Thuilier | Nabil Hathout
Findings of the Association for Computational Linguistics: EACL 2023

In French, the placement of the adjective within a noun phrase is subject to variation: it can appear either before or after the noun. We conduct experiments to assess whether transformer-based language models are able to learn the adjective position in noun phrases in French –a position which depends on several linguistic factors. Prior findings have shown that transformer models are insensitive to permutated word order, but in this work, we show that finetuned models are successful at learning and selecting the correct position of the adjective. However, this success can be attributed to the process of finetuning rather than the linguistic knowledge acquired during pretraining, as evidenced by the low accuracy of experiments of classification that make use of pretrained embeddings. Comparing the finetuned models to the choices of native speakers (with a questionnaire), we notice that the models favor context and global syntactic roles, and are weaker with complex structures and fixed expressions.

2022

pdf abs
About Time: Do Transformers Learn Temporal Verbal Aspect?
Eleni Metheniti | Tim Van De Cruys | Nabil Hathout
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Aspect is a linguistic concept that describes how an action, event, or state of a verb phrase is situated in time. In this paper, we explore whether different transformer models are capable of identifying aspectual features. We focus on two specific aspectual features: telicity and duration. Telicity marks whether the verb’s action or state has an endpoint or not (telic/atelic), and duration denotes whether a verb expresses an action (dynamic) or a state (stative). These features are integral to the interpretation of natural language, but also hard to annotate and identify with NLP methods. We perform experiments in English and French, and our results show that transformer models adequately capture information on telicity and duration in their vectors, even in their non-finetuned forms, but are somewhat biased with regard to verb tense and word order.

2021

pdf abs
Prédire l’aspect linguistique en anglais au moyen de transformers (Classifying Linguistic Aspect in English with Transformers )
Eleni Metheniti | Tim van de Cruys | Nabil Hathout
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

L’aspect du verbe décrit la manière dont une action, un événement ou un état exprimé par un verbe est lié au temps ; la télicité est la propriété d’un syntagme verbal qui présente une action ou un événement comme étant mené à son terme ; la durée distingue les verbes qui expriment une action (dynamique) ou un état (statique). Ces caractéristiques essentielles à l’interprétation du langage naturel, sont également difficiles à annoter et à identifier par les méthodes de TAL. Dans ce travail, nous estimons la capacité de différents modèles de type transformers pré-entraînés (BERT, RoBERTa, XLNet, ALBERT) à prédire la télicité et la durée. Nos résultats montrent que BERT est le plus performant sur les deux tâches, tandis que les modèles XLNet et ALBERT sont les plus faibles. Par ailleurs, les performances de la plupart des modèles sont améliorées lorsqu’on leur fournit en plus la position des verbes. Globalement, notre étude établit que les modèles de type transformers captent en grande partie la télicité et la durée.

2020

pdf abs
How Relevant Are Selectional Preferences for Transformer-based Language Models?
Eleni Metheniti | Tim Van de Cruys | Nabil Hathout
Proceedings of the 28th International Conference on Computational Linguistics

Selectional preference is defined as the tendency of a predicate to favor particular arguments within a certain linguistic context, and likewise, reject others that result in conflicting or implausible meanings. The stellar success of contextual word embedding models such as BERT in NLP tasks has led many to question whether these models have learned linguistic information, but up till now, most research has focused on syntactic information. We investigate whether Bert contains information on the selectional preferences of words, by examining the probability it assigns to the dependent word given the presence of a head word in a sentence. We are using word pairs of head-dependent words in five different syntactic relations from the SP-10K corpus of selectional preference (Zhang et al., 2019b), in sentences from the ukWaC corpus, and we are calculating the correlation of the plausibility score (from SP-10K) and the model probabilities. Our results show that overall, there is no strong positive or negative correlation in any syntactic relation, but we do find that certain head words have a strong correlation and that masking all words but the head word yields the most positive correlations in most scenarios –which indicates that the semantics of the predicate is indeed an integral and influential factor for the selection of the argument.

pdf abs
Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus
Eleni Metheniti | Guenter Neumann
Proceedings of the Twelfth Language Resources and Evaluation Conference

Multilingual, inflectional corpora are a scarce resource in the NLP community, especially corpora with annotated morpheme boundaries. We are evaluating a generated, multilingual inflectional corpus with morpheme boundaries, generated from the English Wiktionary (Metheniti and Neumann, 2018), against the largest, multilingual, high-quality inflectional corpus of the UniMorph project (Kirov et al., 2018). We confirm that the generated Wikinflection corpus is not of such quality as UniMorph, but we were able to extract a significant amount of words from the intersection of the two corpora. Our Wikinflection corpus benefits from the morpheme segmentations of Wiktionary/Wikinflection and from the manually-evaluated morphological feature tags of the UniMorph project, and has 216K lemmas and 5.4M word forms, in a total of 68 languages.

Eleni Metheniti

2023

2022

2021

2020

2019

Co-authors

Venues