How Many Users Are Enough? Exploring Semi-Supervision and Stylometric Features to Uncover a Russian Troll Farm

Nayeema Nasrin, Kim-Kwang Raymond Choo, Myung Ko, Anthony Rios


Abstract
Social media has reportedly been (ab)used by Russian troll farms to promote political agendas. Specifically, state-affiliated actors disguise themselves as native citizens of the United States to promote discord and promote their political motives. Therefore, developing methods to automatically detect Russian trolls can ensure fair elections and possibly reduce political extremism by stopping trolls that produce discord. While data exists for some troll organizations (e.g., Internet Research Agency), it is challenging to collect ground-truth accounts for new troll farms in a timely fashion. In this paper, we study the impact the number of labeled troll accounts has on detection performance. We analyze the use of self-supervision with less than 100 troll accounts as training data. We improve classification performance by nearly 4% F1. Furthermore, in combination with self-supervision, we also explore novel features for troll detection grounded in stylometry. Intuitively, we assume that the writing style is consistent across troll accounts because a single troll organization employee may control multiple user accounts. Overall, we improve on models based on words features by ~9% F1.
Anthology ID:
D19-5003
Volume:
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Anna Feldman, Giovanni Da San Martino, Alberto Barrón-Cedeño, Chris Brew, Chris Leberknight, Preslav Nakov
Venue:
NLP4IF
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–30
Language:
URL:
https://aclanthology.org/D19-5003
DOI:
10.18653/v1/D19-5003
Bibkey:
Cite (ACL):
Nayeema Nasrin, Kim-Kwang Raymond Choo, Myung Ko, and Anthony Rios. 2019. How Many Users Are Enough? Exploring Semi-Supervision and Stylometric Features to Uncover a Russian Troll Farm. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 20–30, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
How Many Users Are Enough? Exploring Semi-Supervision and Stylometric Features to Uncover a Russian Troll Farm (Nasrin et al., NLP4IF 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/D19-5003.pdf