Adrian Vergara Heidke
2025
CoWoYTP1Att: A Social Media Comment Dataset on Gender Discourse with Appraisal Theory Annotations
Valentina Tretti Beckles
|
Adrian Vergara Heidke
|
Natalia Molina-Valverde
Proceedings of the 5th Conference on Language, Data and Knowledge
10 This paper presents the Corpus on Women in YouTube on Performance with Attitude Annotations (CoWoYTP1Att), developed based on Appraisal Theory (Martin & White, 2005). Between September 2020 and May 2021, 14,883 comments were extracted from a YouTube video featuring a compilation of the performance “Un violador en tu camino” (A Rapist in Your Path) by the feminist collective LasTesis, published on the channel of the Costa Rican newspaper La Nación. The extracted comments were manually and automatically classified based on several criteria to determine their relevance to the video. As a result, 5,939 comments were identified as related to the video. These comments were annotated with the three attitude subdomains (affect, judgement, and appreciation) proposed on the Appraisal Theory (Martin & White, 2005), as well as their polarity, target, fragment, and whether the attitude was implicit or explicit. The statistical analysis of the corpus highlights the predominant negative evaluation of individuals present in the comments on this social media platform.
Thematic Categorization on Pineapple Production in Costa Rica: An Exploratory Analysis through Topic Modeling
Valentina Tretti Beckles
|
Adrian Vergara Heidke
Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025)
Costa Rica is one of the largest producers and exporters of pineapple in the world. This status has encouraged multinational companies to use plantations in this Central American country for experimentation and the cultivation of new varieties, such as the Pinkglow pineapple. However, pineapple monoculture has significant socio-environmental impacts on the regions where it is cultivated.In this exploratory study, we aimed to analyze how pineapple production is portrayed on the Internet. To achieve this, we collected a corpus of texts in Spanish and English from online sources in two phases: using the BootCat tool and manual search on newspaper websites. The Hierarchical Dirichlet Process (HDP) topic model was then applied to identify dominant topics within the corpus. These topics were subsequently classified into thematic categories, and the texts were categorized accordingly. The findings indicate that environmental issues related to pineapple cultivation are underrepresented on the Internet, particularly in comparison to the extensive focus on topics related to pineapple production and marketing.