Abstract
The sequence of documents produced by any given author varies in style and content, but some documents are more typical or representative of the source than others. We quantify the extent to which a given short text is characteristic of a specific person, using a dataset of tweets from fifteen celebrities. Such analysis is useful for generating excerpts of high-volume Twitter profiles, and understanding how representativeness relates to tweet popularity. We first consider the related task of binary author detection (is x the author of text T?), and report a test accuracy of 90.37% for the best of five approaches to this problem. We then use these models to compute characterization scores among all of an author’s texts. A user study shows human evaluators agree with our characterization model for all 15 celebrities in our dataset, each with p-value < 0.05. We use these classifiers to show surprisingly strong correlations between characterization scores and the popularity of the associated texts. Indeed, we demonstrate a statistically significant correlation between this score and tweet popularity (likes/replies/retweets) for 13 of the 15 celebrities in our study.- Anthology ID:
- D19-1175
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1653–1663
- Language:
- URL:
- https://aclanthology.org/D19-1175
- DOI:
- 10.18653/v1/D19-1175
- Cite (ACL):
- Charuta Pethe and Steve Skiena. 2019. The Trumpiest Trump? Identifying a Subject’s Most Characteristic Tweets. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1653–1663, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- The Trumpiest Trump? Identifying a Subject’s Most Characteristic Tweets (Pethe & Skiena, EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/D19-1175.pdf