Julien Velcin

2021

pdf abs
Writing Style Author Embedding Evaluation
Enzo Terreau | Antoine Gourru | Julien Velcin
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

Learning authors representations from their textual productions is now widely used to solve multiple downstream tasks, such as classification, link prediction or user recommendation. Author embedding methods are often built on top of either Doc2Vec (Mikolov et al. 2014) or the Transformer architecture (Devlin et al. 2019). Evaluating the quality of these embeddings and what they capture is a difficult task. Most articles use either classification accuracy or authorship attribution, which does not clearly measure the quality of the representation space, if it really captures what it has been built for. In this paper, we propose a novel evaluation framework of author embedding methods based on the writing style. It allows to quantify if the embedding space effectively captures a set of stylistic features, chosen to be the best proxy of an author writing style. This approach gives less importance to the topics conveyed by the documents. It turns out that recent models are mostly driven by the inner semantic of authors’ production. They are outperformed by simple baselines, based on state-of-the-art pretrained sentence embedding models, on several linguistic axes. These baselines can grasp complex linguistic phenomena and writing style more efficiently, paving the way for designing new style-driven author embedding models.

pdf abs
Monitoring geometrical properties of word embeddings for detecting the emergence of new topics.
Clément Christophe | Julien Velcin | Jairo Cugliari | Manel Boumghar | Philippe Suignard
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Slow emerging topic detection is a task between event detection, where we aggregate behaviors of different words on short period of time, and language evolution, where we monitor their long term evolution. In this work, we tackle the problem of early detection of slowly emerging new topics. To this end, we gather evidence of weak signals at the word level. We propose to monitor the behavior of words representation in an embedding space and use one of its geometrical properties to characterize the emergence of topics. As evaluation is typically hard for this kind of task, we present a framework for quantitative evaluation and show positive results that outperform state-of-the-art methods. Our method is evaluated on two public datasets of press and scientific articles.

2017

pdf abs
Détection automatique de métaphores dans des textes de Géographie : une étude prospective (Automatic detection of metaphors in Geographical research papers : a prospective study)
Max Beligné | Aleksandra Campar | Jean-Hugues Chauchat | Melanie Lefeuvre | Isabelle Lefort | Sabine Loudcher | Julien Velcin
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Cet article s’intègre dans un projet collaboratif qui vise à réaliser une analyse longitudinale de la production universitaire en Géographie. En particulier, nous présentons les premiers résultats de l’application d’une méthode de détection automatique de métaphores basée sur les modèles de thématiques latentes. Une analyse détaillée permet de mieux comprendre l’impact de certains choix et de réfléchir aux pistes de recherche que nous serons amenés à explorer pour améliorer ces résultats.

2015

pdf abs
Etude de l’image de marque d’entités dans le cadre d’une plateforme de veille sur le Web social
Leila Khouas | Caroline Brun | Anne Peradotto | Jean-Valère Cossu | Julien Boyadjian | Julien Velcin
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

Ce travail concerne l’intégration à une plateforme de veille sur internet d’outils permettant l’analyse des opinions émises par les internautes à propos d’une entité, ainsi que la manière dont elles évoluent dans le temps. Les entités considérées peuvent être des personnes, des entreprises, des marques, etc. Les outils implémentés sont le produit d’une collaboration impliquant plusieurs partenaires industriels et académiques dans le cadre du projet ANR ImagiWeb.

2014

The objective of this paper is to describe the design of a dataset that deals with the image (i.e., representation, web reputation) of various entities populating the Internet: politicians, celebrities, companies, brands etc. Our main contribution is to build and provide an original annotated French dataset. This dataset consists of 11527 manually annotated tweets expressing the opinion on specific facets (e.g., ethic, communication, economic project) describing two French policitians over time. We believe that other researchers might benefit from this experience, since designing and implementing such a dataset has proven quite an interesting challenge. This design comprises different processes such as data selection, formal definition and instantiation of an image. We have set up a full open-source annotation platform. In addition to the dataset design, we present the first results that we obtained by applying clustering methods to the annotated dataset in order to extract the entity images.

2013

pdf
AMI&ERIC: How to Learn with Naive Bayes and Prior Knowledge: an Application to Sentiment Analysis
Mohamed Dermouche | Leila Khouas | Julien Velcin | Sabine Loudcher
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)