Discourse Mode Categorization of Bengali Social Media Health Text

Salim Sazzed


Abstract
The scarcity of annotated data is a major impediment to natural language processing (NLP) research in Bengali, a language that is considered low-resource. In particular, the health and medical domains suffer from a severe paucity of annotated data. Thus, this study aims to introduce BanglaSocialHealth, an annotated social media health corpus that provides sentence-level annotations of four distinct types of expression modes, namely narrative (NAR), informative (INF), suggestive (SUG), and inquiring (INQ) modes in Bengali. We provide details regarding the annotation procedures and report various statistics, such as the median and mean length of words in different sentence modes. Additionally, we apply classical machine learning (CML) classifiers and transformer-based language models to classify sentence modes. We find that most of the statistical properties are similar in different types of sentence modes. To determine the sentence mode, the transformer-based M-BERT model provides slightly better efficacy than the CML classifiers. Our developed corpus and analysis represent a much-needed contribution to Bengali NLP research in medical and health domains and have the potential to facilitate a range of downstream tasks, including question-answering, misinformation detection, and information retrieval.
Anthology ID:
2023.wassa-1.6
Volume:
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Jeremy Barnes, Orphée De Clercq, Roman Klinger
Venue:
WASSA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–57
Language:
URL:
https://aclanthology.org/2023.wassa-1.6
DOI:
10.18653/v1/2023.wassa-1.6
Bibkey:
Cite (ACL):
Salim Sazzed. 2023. Discourse Mode Categorization of Bengali Social Media Health Text. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, pages 52–57, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Discourse Mode Categorization of Bengali Social Media Health Text (Sazzed, WASSA 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2023.wassa-1.6.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-2/2023.wassa-1.6.mp4