Abstract
Contemporary tobacco-related studies are mostly concerned with a single social media platform while missing out on a broader audience. Moreover, they are heavily reliant on labeled datasets, which are expensive to make. In this work, we explore sentiment and product identification on tobacco-related text from two social media platforms. We release SentiSmoke-Twitter and SentiSmoke-Reddit datasets, along with a comprehensive annotation schema for identifying tobacco products’ sentiment. We then perform benchmarking text classification experiments using state-of-the-art models, including BERT, RoBERTa, and DistilBERT. Our experiments show F1 scores as high as 0.72 for sentiment identification in the Twitter dataset, 0.46 for sentiment identification, and 0.57 for product identification using semi-supervised learning for Reddit.- Anthology ID:
- 2021.ranlp-1.173
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 1545–1552
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.173
- DOI:
- Cite (ACL):
- Venkata Himakar Yanamandra, Kartikey Pant, and Radhika Mamidi. 2021. Towards Sentiment Analysis of Tobacco Products’ Usage in Social Media. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1545–1552, Held Online. INCOMA Ltd..
- Cite (Informal):
- Towards Sentiment Analysis of Tobacco Products’ Usage in Social Media (Yanamandra et al., RANLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.ranlp-1.173.pdf