Farig Sadeque


TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla
Nazia Tasnim | Md. Istiak Shihab | Asif Shahriyar Sushmit | Steven Bethard | Farig Sadeque
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Many areas, such as the biological and healthcare domain, artistic works, and organization names, have nested, overlapping, discontinuous entity mentions that may even be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they may violate the assumptions upon which sequence tagging schemes are founded. In this paper, we describe our contribution to SemEval 2022 Task 11 on identifying such complex Named Entities. We have leveraged the ensemble of multiple ELECTRA-based models that were exclusively pretrained on the Bangla language with the performance of ELECTRA-based models pretrained on English to achieve competitive performance on the Track-11. Besides providing a system description, we will also present the outcomes of our experiments on architectural decisions, dataset augmentations, and post-competition findings.


A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction
Chen Lin | Timothy Miller | Dmitriy Dligach | Farig Sadeque | Steven Bethard | Guergana Savova
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Recently BERT has achieved a state-of-the-art performance in temporal relation extraction from clinical Electronic Medical Records text. However, the current approach is inefficient as it requires multiple passes through each input sequence. We extend a recently-proposed one-pass model for relation classification to a one-pass model for relation extraction. We augment this framework by introducing global embeddings to help with long-distance relation inference, and by multi-task learning to increase model performance and generalizability. Our proposed model produces results on par with the state-of-the-art in temporal relation extraction on the THYME corpus and is much “greener” in computational cost.


Incivility Detection in Online Comments
Farig Sadeque | Stephen Rains | Yotam Shmargad | Kate Kenski | Kevin Coe | Steven Bethard
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Incivility in public discourse has been a major concern in recent times as it can affect the quality and tenacity of the discourse negatively. In this paper, we present neural models that can learn to detect name-calling and vulgarity from a newspaper comment section. We show that in contrast to prior work on detecting toxic language, fine-grained incivilities like namecalling cannot be accurately detected by simple models like logistic regression. We apply the models trained on the newspaper comments data to detect uncivil comments in a Russian troll dataset, and find that despite the change of domain, the model makes accurate predictions.


Semi-supervised CLPsych 2016 Shared Task System Submission
Nicolas Rey-Villamizar | Prasha Shrestha | Thamar Solorio | Farig Sadeque | Steven Bethard | Ted Pedersen
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

Analysis of Anxious Word Usage on Online Health Forums
Nicolas Rey-Villamizar | Prasha Shrestha | Farig Sadeque | Steven Bethard | Ted Pedersen | Arjun Mukherjee | Thamar Solorio
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis

Why Do They Leave: Modeling Participation in Online Depression Forums
Farig Sadeque | Ted Pedersen | Thamar Solorio | Prasha Shrestha | Nicolas Rey-Villamizar | Steven Bethard
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media

Age and Gender Prediction on Health Forum Data
Prasha Shrestha | Nicolas Rey-Villamizar | Farig Sadeque | Ted Pedersen | Steven Bethard | Thamar Solorio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users’ content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user’s age and gender from their forum posts. We use a mix of features from a user’s text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.


pdf bib
Predicting Continued Participation in Online Health Forums
Farig Sadeque | Thamar Solorio | Ted Pedersen | Prasha Shrestha | Steven Bethard
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis