Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Jian Zhu, David Jurgens


Abstract
An individual’s variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features. The neural model achieves strong performance at authorship identification on short texts and through an analogy-based probing task, showing that the learned representations exhibit surprising regularities that encode qualitative and quantitative shifts of idiolectal styles. Through text perturbation, we quantify the relative contributions of different linguistic elements to idiolectal variation. Furthermore, we provide a description of idiolects through measuring inter- and intra-author variation, showing that variation in idiolects is often distinctive yet consistent.
Anthology ID:
2021.emnlp-main.25
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
279–297
Language:
URL:
https://aclanthology.org/2021.emnlp-main.25
DOI:
10.18653/v1/2021.emnlp-main.25
Bibkey:
Cite (ACL):
Jian Zhu and David Jurgens. 2021. Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 279–297, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles (Zhu & Jurgens, EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.emnlp-main.25.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2021.emnlp-main.25.mp4
Code
 lingjzhu/idiolect