Age and Gender Prediction on Health Forum Data

Prasha Shrestha, Nicolas Rey-Villamizar, Farig Sadeque, Ted Pedersen, Steven Bethard, Thamar Solorio


Abstract
Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users’ content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user’s age and gender from their forum posts. We use a mix of features from a user’s text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.
Anthology ID:
L16-1541
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3394–3401
Language:
URL:
https://aclanthology.org/L16-1541
DOI:
Bibkey:
Cite (ACL):
Prasha Shrestha, Nicolas Rey-Villamizar, Farig Sadeque, Ted Pedersen, Steven Bethard, and Thamar Solorio. 2016. Age and Gender Prediction on Health Forum Data. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3394–3401, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Age and Gender Prediction on Health Forum Data (Shrestha et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/improve-issue-templates/L16-1541.pdf