This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
ValerieNovak
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Social media has provided a platform for many individuals to easily express themselves naturally and publicly, and researchers have had the opportunity to utilize large quantities of this data to improve author trait analysis techniques and to improve author trait profiling systems. The majority of the work in this area, however, has been narrowly spent on English and other Western European languages, and generally focuses on a single social network at a time, despite the large quantity of data now available across languages and differences that have been found across platforms. This paper introduces RU-ADEPT, a dataset of Russian authors’ personality trait scores–Big Five and Dark Triad, demographic information (e.g. age, gender), with associated corpus of the authors’ cross-contributions to (up to) four different social media platforms–VKontakte (VK), LiveJournal, Blogger, and Moi Mir. We believe this to be the first publicly-available dataset associating demographic and personality trait data with Russian-language social media content, the first paper to describe the collection of Dark Triad scores with texts across multiple Russian-language social media platforms, and to a limited extent, the first publicly-available dataset of personality traits to author content across several different social media sites.
Feature engineering is an important step in classical NLP pipelines, but machine learning engineers may not be aware of the signals to look for when processing foreign language text. The Russian Feature Extraction Toolkit (RFET) is a collection of feature extraction libraries bundled for ease of use by engineers who do not speak Russian. RFET’s current feature set includes features applicable to social media genres of text and to computational social science tasks. We demonstrate the effectiveness of the tool by using it in a personality trait identification task. We compare the performance of Support Vector Machines (SVMs) trained with and without the features provided by RFET; we also compare it to a SVM with neural embedding features generated by Sentence-BERT.