Abstract
Age-related stereotypes are pervasive in our society, and yet have been under-studied in the NLP community. Here, we present a method for extracting age-related stereotypes from Twitter data, generating a corpus of 300,000 over-generalizations about four contemporary generations (baby boomers, generation X, millennials, and generation Z), as well as “old” and “young” people more generally. By employing word-association metrics, semi-supervised topic modelling, and density-based clustering, we uncover many common stereotypes as reported in the media and in the psychological literature, as well as some more novel findings. We also observe trends consistent with the existing literature, namely that definitions of “young” and “old” age appear to be context-dependent, stereotypes for different generations vary across different topics (e.g., work versus family life), and some age-based stereotypes are distinct from generational stereotypes. The method easily extends to other social group labels, and therefore can be used in future work to study stereotypes of different social categories. By better understanding how stereotypes are formed and spread, and by tracking emerging stereotypes, we hope to eventually develop mitigating measures against such biased statements.