Muskan Garg


CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts
Muskan Garg | Chandni Saxena | Sriparna Saha | Veena Krishnan | Ruchi Joshi | Vijay Mago
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The social NLP researchers and mental health practitioners have witnessed exponential growth in the field of mental health detection and analysis on social media. It has become important to identify the reason behind mental illness. In this context, we introduce a new dataset for Causal Analysis of Mental health in Social media posts (CAMS). We first introduce the annotation schema for this task of causal analysis. The causal analysis comprises of two types of annotations, viz, causal interpretation and causal categorization. We show the efficacy of our scheme in two ways: (i) crawling and annotating 3155 Reddit data and (ii) re-annotate the publicly available SDCNL dataset of 1896 instances for interpretable causal analysis. We further combine them as CAMS dataset and make it available along with the other source codes Our experimental results show that the hybrid CNN-LSTM model gives the best performance over CAMS dataset.

Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers
Muskan Garg | Seema Wazarkar | Muskaan Singh | Ondřej Bojar
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the development of multimodal systems and natural language generation techniques, the resurgence of multimodal datasets has attracted significant research interests, which aims to provide new information to enrich the representation of textual data. However, there remains a lack of a comprehensive survey for this task. To this end, we take the first step and present a thorough review of this research field. This paper provides an overview of a publicly available dataset with different modalities according to the applications. Furthermore, we discuss the new frontier and give our thoughts. We hope this survey of multimodal datasets can provide the community with quick access and a general picture of the multimodal dataset for specific Natural Language Processing (NLP) applications and motivates future researches. In this context, we release the collection of all multimodal datasets easily accessible here:

pdf bib
EdgeGraph: Revisiting Statistical Measures for Language Independent Keyphrase Extraction Leveraging on Bi-grams
Muskan Garg | Amit Gupta
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

The NLP research community resort conventional Word Co-occurrence Network (WCN) for keyphrase extraction using random walk sampling mechanism such as PageRank algo rithm to identify candidate words/ phrases. We argue that the nature of WCN is a path-based network and does not follow a core-periphery structure as observed in web-page linking network. Thus, the language networks leveraging on bi-grams may represent better semantics for keyphrase extraction using random walk. In this work, we use bi-gram as a node and adjacent bi-grams are linked together to generate an EdgeGraph. We validate our method over four publicly available dataset to demonstrate the effectiveness of our simple yet effective language network and our extensive experiments show that random walk over EdgeGraph representation performs better than conventional WCN. We make our codes and supplementary materials available over Github.


Data Augmentation for Mental Health Classification on Social Media
Gunjan Ansari | Muskan Garg | Chandni Saxena
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

The mental disorder of online users is determined using social media posts. The major challenge in this domain is to avail the ethical clearance for using the user-generated text on social media platforms. Academic researchers identified the problem of insufficient and unlabeled data for mental health classification. To handle this issue, we have studied the effect of data augmentation techniques on domain-specific user-generated text for mental health classification. Among the existing well-established data augmentation techniques, we have identified Easy Data Augmentation (EDA), conditional BERT, and Back-Translation (BT) as the potential techniques for generating additional text to improve the performance of classifiers. Further, three different classifiers- Random Forest (RF), Support Vector Machine (SVM) and Logistic Regression (LR) are employed for analyzing the impact of data augmentation on two publicly available social media datasets. The experimental results show significant improvements in classifiers’ performance when trained on the augmented data.