2020
pdf
abs
FrSemCor: Annotating a French Corpus with Supersenses
Lucie Barque
|
Pauline Haas
|
Richard Huyghe
|
Delphine Tribout
|
Marie Candito
|
Benoit Crabbé
|
Vincent Segonne
Proceedings of the Twelfth Language Resources and Evaluation Conference
French, as many languages, lacks semantically annotated corpus data. Our aim is to provide the linguistic and NLP research communities with a gold standard sense-annotated corpus of French, using WordNet Unique Beginners as semantic tags, thus allowing for interoperability. In this paper, we report on the first phase of the project, which focused on the annotation of common nouns. The resulting dataset consists of more than 12,000 French noun occurrences which were annotated in double blind and adjudicated according to a carefully redefined set of supersenses. The resource is released online under a Creative Commons Licence.
pdf
abs
SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings
Cindy Aloui
|
Carlos Ramisch
|
Alexis Nasr
|
Lucie Barque
Proceedings of the 28th International Conference on Computational Linguistics
Contextualised embeddings such as BERT have become de facto state-of-the-art references in many NLP applications, thanks to their impressive performances. However, their opaqueness makes it hard to interpret their behaviour. SLICE is a hybrid model that combines supersense labels with contextual embeddings. We introduce a weakly supervised method to learn interpretable embeddings from raw corpora and small lists of seed words. Our model is able to represent both a word and its context as embeddings into the same compact space, whose dimensions correspond to interpretable supersenses. We assess the model in a task of supersense tagging for French nouns. The little amount of supervision required makes it particularly well suited for low-resourced scenarios. Thanks to its interpretability, we perform linguistic analyses about the predicted supersenses in terms of input word and context representations.
2019
pdf
abs
Demonette2 - Une base de données dérivationnelle du français à grande échelle : premiers résultats (Demonette2 – A large scale derivational database for French: first results)
Fiammetta Namer
|
Lucie Barque
|
Olivier Bonami
|
Pauline Haas
|
Nabil Hathout
|
Delphine Tribout
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts
Cet article présente la conception et le développement de Demonette2, une base de données dérivationnelle à grande échelle du français, développée dans le cadre du projet ANR Démonext (ANR-17-CE23-0005). L’article décrit les objectifs du projet, la structure de la base et expose les premiers résultats du projet, en mettant l’accent sur un enjeu crucial : la question du codage sémantique des entrées et des relations.
2016
pdf
abs
Improvement of VerbNet-like resources by frame typing
Laurence Danlos
|
Matthieu Constant
|
Lucie Barque
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)
Verbenet is a French lexicon developed by “translation” of its English counterpart — VerbNet (Kipper-Schuler, 2005)—and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a “virtuous circle”. We present the principles underlying a program we developed and used to automatically type frames in VerbeNet. We also show that our system is portable to other languages.
2014
pdf
abs
Developing a French FrameNet: Methodology and First results
Marie Candito
|
Pascal Amsili
|
Lucie Barque
|
Farah Benamara
|
Gaël de Chalendar
|
Marianne Djemaa
|
Pauline Haas
|
Richard Huyghe
|
Yvette Yannick Mathieu
|
Philippe Muller
|
Benoît Sagot
|
Laure Vieu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the first part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain.
2012
pdf
Dictionary-ontology cross-enrichment
Emmanuel Eckard
|
Lucie Barque
|
Alexis Nasr
|
Benoît Sagot
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon
pdf
Extracting a Semantic Lexicon of French Adjectives from a Large Lexicographic Dictionary
Selja Seppälä
|
Lucie Barque
|
Alexis Nasr
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)
2010
pdf
abs
Building a Lexicon of French Deverbal Nouns from a Semantically Annotated Corpus
Antonio Balvet
|
Lucie Barque
|
Rafael Marín
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper presents project Nomage, which aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. They are both presented in this paper, and emphasize how the semantic annotations of the corpus allow the lexicographic description of deverbal nouns to be validated, in particular their polysemy. Nominalizations have occupied a central place in grammatical analysis, with a focus on morphological and syntactic aspects. More recently, researchers have begun to address a specific issue often neglected before, i.e. the semantics of nominalizations, and its implications for Natural Language Processing applications such as electronic ontologies or Information Retrieval. We focus on precisely this issue in the research project NOMAGE, funded by the French National Research Agency (ANR-07-JCJC-0085-01). In this paper, we present the Nomage corpus and the annotations we make on deverbal nouns (section 2). We then show how we build our lexicon with the semantically annotated corpus and illustrate the kind of generalizations we can make from such data (section 3).
2008
pdf
abs
La polysémie régulière dans WordNet
Lucie Barque
|
François-Régis Chaumartin
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Cette étude propose une analyse et une modélisation des relations de polysémie dans le lexique électronique anglais WordNet. Elle exploite pour cela la hiérarchie des concepts (représentés par des synsets), et la définition associée à chacun de ces concepts. Le résultat est constitué d’un ensemble de règles qui nous ont permis d’identifier d’une façon largement automatisée, avec une précision voisine de 91%, plus de 2100 paires de synsets liés par une relation de polysémie régulière. Notre méthode permet aussi une désambiguïsation lexicale partielle des mots de la définition associée à ces synsets.
2005
pdf
bib
abs
Application du métalangage de la BDéf au traitement formel de la polysémie
Lucie Barque
|
Alain Polguère
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Cet article a pour objet le métalangage définitionnel de la base de données lexicale BDéf, plus précisément l’utilisation de ce métalangage dans la modélisation des structures polysémiques du français. La Bdéf encode sous forme de définitions lexicographiques les sens lexicaux d’un sous-ensemble représentatif du lexique du français parmi lequel on compte environ 500 unités polysémiques appartenant aux principales parties du discours. L’article comprend deux sections. La première présente le métalangage de la BDéf et le situe par rapport aux différents types de définitions lexicales, qu’elles soient ou non formelles, qu’elles visent ou non l’informatisation. La seconde section présente une application de la BDéf qui vise à terme à rendre compte de la polysémie régulière du français. On y présente, à partir d’un cas spécifique, la notion de patron de polysémie.
2004
pdf
bib
abs
De la lexie au vocable : la représentation formelle des liens de polysémie
Lucie Barque
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (Posters)
Cet article s’intéresse aux définitions formalisées de la base de données BDéf et montre en quoi la structure formelle de ces définitions est à même d’offrir une représentation originale de la polysémie lexicale.