Writing Style Author Embedding Evaluation

Enzo Terreau, Antoine Gourru, Julien Velcin


Abstract
Learning authors representations from their textual productions is now widely used to solve multiple downstream tasks, such as classification, link prediction or user recommendation. Author embedding methods are often built on top of either Doc2Vec (Mikolov et al. 2014) or the Transformer architecture (Devlin et al. 2019). Evaluating the quality of these embeddings and what they capture is a difficult task. Most articles use either classification accuracy or authorship attribution, which does not clearly measure the quality of the representation space, if it really captures what it has been built for. In this paper, we propose a novel evaluation framework of author embedding methods based on the writing style. It allows to quantify if the embedding space effectively captures a set of stylistic features, chosen to be the best proxy of an author writing style. This approach gives less importance to the topics conveyed by the documents. It turns out that recent models are mostly driven by the inner semantic of authors’ production. They are outperformed by simple baselines, based on state-of-the-art pretrained sentence embedding models, on several linguistic axes. These baselines can grasp complex linguistic phenomena and writing style more efficiently, paving the way for designing new style-driven author embedding models.
Anthology ID:
2021.eval4nlp-1.9
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
EMNLP | Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
84–93
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.9
DOI:
10.18653/v1/2021.eval4nlp-1.9
Bibkey:
Cite (ACL):
Enzo Terreau, Antoine Gourru, and Julien Velcin. 2021. Writing Style Author Embedding Evaluation. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 84–93, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Writing Style Author Embedding Evaluation (Terreau et al., Eval4NLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.eval4nlp-1.9.pdf
Software:
 2021.eval4nlp-1.9.Software.zip
Code
 enzofleur/style_embedding_evaluation