IITRoorkee@SMM4H 2024 Cross-Platform Age Detection in Twitter and Reddit Using Transformer-Based Model
Thadavarthi Sankar, Dudekula Suraj, Mallamgari Reddy, Durga Toshniwal, Amit Agarwal
Abstract
This paper outlines the methodology for the automatic extraction of self-reported ages from social media posts as part of the Social Media Mining for Health (SMM4H) 2024 Workshop Shared Tasks. The focus was on Task 6: “Self-reported exact age classification with cross-platform evaluation in English.” The goal was to accurately identify age-related information from user-generated content, which is crucial for applications in public health monitoring, targeted advertising, and demographic research. A number of transformer-based models were employed, including RoBERTa-Base, BERT-Base, BiLSTM, and Flan T5 Base, leveraging their advanced capabilities in natural language understanding. The training strategies included fine-tuning foundational pre-trained language models and evaluating model performance using standard metrics: F1-score, Precision, and Recall. The experimental results demonstrated that the RoBERTa-Base model significantly outperformed the other models in this classification task. The best results achieved with the RoBERTa-Base model were an F1-score of 0.878, a Precision of 0.899, and a Recall of 0.858.- Anthology ID:
- 2024.smm4h-1.23
- Volume:
- Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Dongfang Xu, Graciela Gonzalez-Hernandez
- Venues:
- SMM4H | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 101–105
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.smm4h-1.23/
- DOI:
- Cite (ACL):
- Thadavarthi Sankar, Dudekula Suraj, Mallamgari Reddy, Durga Toshniwal, and Amit Agarwal. 2024. IITRoorkee@SMM4H 2024 Cross-Platform Age Detection in Twitter and Reddit Using Transformer-Based Model. In Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, pages 101–105, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- IITRoorkee@SMM4H 2024 Cross-Platform Age Detection in Twitter and Reddit Using Transformer-Based Model (Sankar et al., SMM4H 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.smm4h-1.23.pdf