Manish Singh
2026
IITPatna_ADE at #SMM4H-HeaRD 2026: Multilingual Adverse Drug Event Detection with LoRA-XLM-RoBERTa, Cross-Fold Ensembles, and Post-hoc Calibration
Sofia Jamil | Manish Singh | Harshal Dharpure | Sriparna Saha | Rajiv Misra
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Sofia Jamil | Manish Singh | Harshal Dharpure | Sriparna Saha | Rajiv Misra
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
We describe our submission to Task 1 of #SMM4H-HeaRD 2026: multilingual binary classification of adverse drug event (ADE) mentions in social media. Our system fine-tunes xlm-roberta-large with LoRA adapters and learned language embeddings, using two-stage training (CADEC translated domain adaptation, then five-fold cross-validation on the official training set). We ensemble the five fold checkpoints by mean logits, apply temperature scaling on the development set, and tune decision thresholds to maximize the official metric. On development, the final ensemble reaches macro-F1 0.788 with a global threshold and 0.796 with per-language thresholds; our best official test submission achieves macro-F1 0.616 (ID 678990).
2024
TpT-ADE: Transformer Based Two-Phase ADE Extraction
Suryamukhi Kuchibhotla | Manish Singh
Proceedings of the 28th Conference on Computational Natural Language Learning
Suryamukhi Kuchibhotla | Manish Singh
Proceedings of the 28th Conference on Computational Natural Language Learning
Extracting adverse reactions to medications or treatments is a crucial activity in the biomedical domain. The task involves identifying mentions of drugs and their adverse effects/events in raw text, which is challenging due to the unstructured nature of clinical narratives. In this paper, we propose TpT-ADE, a novel joint two-phase transformer model combined with natural language processing (NLP) techniques, to identify adverse events (AEs) caused by drugs. In the first phase of TpT-ADE, entities are extracted and are grounded with their standard terms using the Unified Medical Language System (UMLS) knowledge base. In the second phase, entity and relation classification is performed to determine the presence of a relationship between the drug and AE pairs. TpT-ADE also identifies the intensity of AE entities by constructing a parts-of-speech (POS) embedding model. Unlike previous approaches that use complex classifiers, TpT-ADE employs a shallow neural network and yet outperforms the state-of-the-art methods on the standard ADE corpus.
2021
Auditing Keyword Queries Over Text Documents
Bharath Kumar Reddy Apparreddy | Sailaja Rajanala | Manish Singh
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Bharath Kumar Reddy Apparreddy | Sailaja Rajanala | Manish Singh
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Data security and privacy is an issue of growing importance in the healthcare domain. In this paper, we present an auditing system to detect privacy violations for unstructured text documents such as healthcare records. Given a sensitive document, we present an anomaly detection algorithm that can find the top-k suspicious keyword queries that may have accessed the sensitive document. Since unstructured healthcare data, such as medical reports and query logs, are not easily available for public research, in this paper, we show how one can use the publicly available DBLP data to create an equivalent healthcare data and query log, which can then be used for experimental evaluation.