Franck Sajous


2020

pdf
ENGLAWI: From Human- to Machine-Readable Wiktionary
Franck Sajous | Basilio Calderone | Nabil Hathout
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces ENGLAWI, a large, versatile, XML-encoded machine-readable dictionary extracted from Wiktionary. ENGLAWI contains 752,769 articles encoding the full body of information included in Wiktionary: simple words, compounds and multiword expressions, lemmas and inflectional paradigms, etymologies, phonemic transcriptions in IPA, definition glosses and usage examples, translations, semantic and morphological relations, spelling variants, etc. It is fully documented, released under a free license and supplied with G-PeTo, a series of scripts allowing easy information extraction from ENGLAWI. Additional resources extracted from ENGLAWI, such as an inflectional lexicon, a lexicon of diatopic variants and the inclusion dates of headwords in Wiktionary’s nomenclature are also provided. The paper describes the content of the resource and illustrates how it can be - and has been - used in previous studies. We finally introduce an ongoing work that computes lexicographic word embeddings from ENGLAWI’s definitions.

pdf
Glawinette: a Linguistically Motivated Derivational Description of French Acquired from GLAWI
Nabil Hathout | Franck Sajous | Basilio Calderone | Fiammetta Namer
Proceedings of the Twelfth Language Resources and Evaluation Conference

Glawinette is a derivational lexicon of French that will be used to feed the Démonette database. It has been created from the GLAWI machine readable dictionary. We collected couples of words from the definitions and the morphological sections of the dictionary and then selected the ones that form regular formal analogies and that instantiate frequent enough formal patterns. The graph structure of the morphological families has then been used to identify for each couple of lexemes derivational patterns that are close to the intuition of the morphologists.

2016

pdf
Wiktionnaire’s Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary
Nabil Hathout | Franck Sajous
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

GLAWI is a free, large-scale and versatile Machine-Readable Dictionary (MRD) that has been extracted from the French language edition of Wiktionary, called Wiktionnaire. In (Sajous and Hathout, 2015), we introduced GLAWI, gave the rationale behind the creation of this lexicographic resource and described the extraction process, focusing on the conversion and standardization of the heterogeneous data provided by this collaborative dictionary. In the current article, we describe the content of GLAWI and illustrate how it is structured. We also suggest various applications, ranging from linguistic studies, NLP applications to psycholinguistic experimentation. They all can take advantage of the diversity of the lexical knowledge available in GLAWI. Besides this diversity and extensive lexical coverage, GLAWI is also remarkable because it is the only free lexical resource of contemporary French that contains definitions. This unique material opens way to the renewal of MRD-based methods, notably the automated extraction and acquisition of semantic relations.

2015

pdf
Évaluation sur mesure de modèles distributionnels sur un corpus spécialisé : comparaison des approches par contextes syntaxiques et par fenêtres graphiques [Tailor-made evaluation of distributional models on a specialized corpus: comparison of syntactic context and graphical window approaches]
Ludovic Tanguy | Franck Sajous | Nabil Hathout
Traitement Automatique des Langues, Volume 56, Numéro 2 : Sémantique distributionnelle [Distributional semantics]

2014

pdf
Acquisition and enrichment of morphological and morphosemantic knowledge from the French Wiktionary
Nabil Hathout | Franck Sajous | Basilio Calderone
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

pdf bib
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)
Cécile Fabre | Nabil Hathout | Lydia-Mai Ho-Dac | François Morlane-Hondère | Philippe Muller | Franck Sajous | Ludovic Tanguy | Tim Van de Cruys
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

pdf bib
Presentation of the SemDis 2014 workshop: distributional semantics for two tasks - lexical substitution and exploration of specialized corpora (Présentation de l’atelier SemDis 2014 : sémantique distributionnelle pour la substitution lexicale et l’exploration de corpus spécialisés) [in French]
Cécile Fabre | Nabil Hathout | Lydia-Mai Ho-Dac | François Morlane-Hondère | Philippe Muller | Franck Sajous | Ludovic Tanguy | Tim Van de Cruys
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

pdf
Tuning distributional analysis for a small specialized corpus (Ajuster l’analyse distributionnelle à un corpus spécialisé de petite taille) [in French]
Cécile Fabre | Nabil Hathout | Franck Sajous | Ludovic Tanguy
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

pdf
GLÀFF, a Large Versatile French Lexicon
Nabil Hathout | Franck Sajous | Basilio Calderone
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces GLAFF, a large-scale versatile French lexicon extracted from Wiktionary, the collaborative online dictionary. GLAFF contains, for each entry, inflectional features and phonemic transcriptions. It distinguishes itself from the other available French lexicons by its size, its potential for constant updating and its copylefted license. We explain how we have built GLAFF and compare it to other known resources in terms of coverage and quality of the phonemic transcriptions. We show that its size and quality are strong assets that could allow GLAFF to become a reference lexicon for French NLP and linguistics. Moreover, other derived lexicons can easily be based on GLAFF to satisfy specific needs of various fields such as psycholinguistics.

2013

pdf
GLÀFF, a Large Versatile French Lexicon (GLÀFF, un Gros Lexique À tout Faire du Français) [in French]
Franck Sajous | Nabil Hathout | Basilio Calderone
Proceedings of TALN 2013 (Volume 1: Long Papers)

2011

pdf bib
Enrichissement de lexiques sémantiques approvisionnés par les foules : le système WISIGOTH appliqué à Wiktionary [Enrichment of crowdsourced semantic networks: The WISIGOTH system applied to Wiktionary]
Franck Sajous | Emmanuel Navarro | Bruno Gaume
Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]

2009

pdf
Wiktionary for Natural Language Processing: Methodology and Limitations
Emmanuel Navarro | Franck Sajous | Bruno Gaume | Laurent Prévot | ShuKai Hsieh | Ivy Kuo | Pierre Magistry | Chu-Ren Huang
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)