RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits
C. Anton Rytting, Valerie Novak, James R. Hull, Victor M. Frank, Paul Rodrigues, Jarrett G. W. Lee, Laurel Miller-Sims
Abstract
Social media has provided a platform for many individuals to easily express themselves naturally and publicly, and researchers have had the opportunity to utilize large quantities of this data to improve author trait analysis techniques and to improve author trait profiling systems. The majority of the work in this area, however, has been narrowly spent on English and other Western European languages, and generally focuses on a single social network at a time, despite the large quantity of data now available across languages and differences that have been found across platforms. This paper introduces RU-ADEPT, a dataset of Russian authors’ personality trait scores–Big Five and Dark Triad, demographic information (e.g. age, gender), with associated corpus of the authors’ cross-contributions to (up to) four different social media platforms–VKontakte (VK), LiveJournal, Blogger, and Moi Mir. We believe this to be the first publicly-available dataset associating demographic and personality trait data with Russian-language social media content, the first paper to describe the collection of Dark Triad scores with texts across multiple Russian-language social media platforms, and to a limited extent, the first publicly-available dataset of personality traits to author content across several different social media sites.- Anthology ID:
- 2022.lrec-1.12
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 109–118
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.12
- DOI:
- Cite (ACL):
- C. Anton Rytting, Valerie Novak, James R. Hull, Victor M. Frank, Paul Rodrigues, Jarrett G. W. Lee, and Laurel Miller-Sims. 2022. RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 109–118, Marseille, France. European Language Resources Association.
- Cite (Informal):
- RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits (Rytting et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2022.lrec-1.12.pdf