This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LucieBarque
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Many languages lack broad-coverage, semantically annotated lexical resources, which limits empirical research on lexical semantics for these languages. In this paper, we report on how we automatically enriched the French Wiktionnary with general semantic classes, known as supersenses, using a limited amount of manually annotated data. We trained a classifier combining sense definition classification and sense exemplars classification. The resulting resource, with an evaluated supersense accuracy of nearly 85% (92% for hypersenses), is used in a case study illustrating how such an semantically enriched resource can be leveraged to empirically test linguistic hypotheses about the lexicon, on a large scale.
Contextualised embeddings such as BERT have become de facto state-of-the-art references in many NLP applications, thanks to their impressive performances. However, their opaqueness makes it hard to interpret their behaviour. SLICE is a hybrid model that combines supersense labels with contextual embeddings. We introduce a weakly supervised method to learn interpretable embeddings from raw corpora and small lists of seed words. Our model is able to represent both a word and its context as embeddings into the same compact space, whose dimensions correspond to interpretable supersenses. We assess the model in a task of supersense tagging for French nouns. The little amount of supervision required makes it particularly well suited for low-resourced scenarios. Thanks to its interpretability, we perform linguistic analyses about the predicted supersenses in terms of input word and context representations.
French, as many languages, lacks semantically annotated corpus data. Our aim is to provide the linguistic and NLP research communities with a gold standard sense-annotated corpus of French, using WordNet Unique Beginners as semantic tags, thus allowing for interoperability. In this paper, we report on the first phase of the project, which focused on the annotation of common nouns. The resulting dataset consists of more than 12,000 French noun occurrences which were annotated in double blind and adjudicated according to a carefully redefined set of supersenses. The resource is released online under a Creative Commons Licence.
Cet article présente la conception et le développement de Demonette2, une base de données dérivationnelle à grande échelle du français, développée dans le cadre du projet ANR Démonext (ANR-17-CE23-0005). L’article décrit les objectifs du projet, la structure de la base et expose les premiers résultats du projet, en mettant l’accent sur un enjeu crucial : la question du codage sémantique des entrées et des relations.
Verbenet is a French lexicon developed by “translation” of its English counterpart — VerbNet (Kipper-Schuler, 2005)—and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a “virtuous circle”. We present the principles underlying a program we developed and used to automatically type frames in VerbeNet. We also show that our system is portable to other languages.
The Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the first part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain.
This paper presents project Nomage, which aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. They are both presented in this paper, and emphasize how the semantic annotations of the corpus allow the lexicographic description of deverbal nouns to be validated, in particular their polysemy. Nominalizations have occupied a central place in grammatical analysis, with a focus on morphological and syntactic aspects. More recently, researchers have begun to address a specific issue often neglected before, i.e. the semantics of nominalizations, and its implications for Natural Language Processing applications such as electronic ontologies or Information Retrieval. We focus on precisely this issue in the research project NOMAGE, funded by the French National Research Agency (ANR-07-JCJC-0085-01). In this paper, we present the Nomage corpus and the annotations we make on deverbal nouns (section 2). We then show how we build our lexicon with the semantically annotated corpus and illustrate the kind of generalizations we can make from such data (section 3).
Cette étude propose une analyse et une modélisation des relations de polysémie dans le lexique électronique anglais WordNet. Elle exploite pour cela la hiérarchie des concepts (représentés par des synsets), et la définition associée à chacun de ces concepts. Le résultat est constitué d’un ensemble de règles qui nous ont permis d’identifier d’une façon largement automatisée, avec une précision voisine de 91%, plus de 2100 paires de synsets liés par une relation de polysémie régulière. Notre méthode permet aussi une désambiguïsation lexicale partielle des mots de la définition associée à ces synsets.
Cet article a pour objet le métalangage définitionnel de la base de données lexicale BDéf, plus précisément l’utilisation de ce métalangage dans la modélisation des structures polysémiques du français. La Bdéf encode sous forme de définitions lexicographiques les sens lexicaux d’un sous-ensemble représentatif du lexique du français parmi lequel on compte environ 500 unités polysémiques appartenant aux principales parties du discours. L’article comprend deux sections. La première présente le métalangage de la BDéf et le situe par rapport aux différents types de définitions lexicales, qu’elles soient ou non formelles, qu’elles visent ou non l’informatisation. La seconde section présente une application de la BDéf qui vise à terme à rendre compte de la polysémie régulière du français. On y présente, à partir d’un cas spécifique, la notion de patron de polysémie.
Cet article s’intéresse aux définitions formalisées de la base de données BDéf et montre en quoi la structure formelle de ces définitions est à même d’offrir une représentation originale de la polysémie lexicale.