Md Sultan Al Nahian

2026

LitTx: A New Treatment Relation Extraction Dataset
Yuhang Jiang | Md Sultan Al Nahian | Li Hao Richie Xu | Rani Chikkanna | Ramakanth Kavuluru
Proceedings of the Fifteenth Language Resources and Evaluation Conference

The interest in biomedical relation extraction (RE) continues to persist even in the LLM era owing to RE being a prominent way to build knowledge graphs, which further ground LLM applications, especially in preventing hallucinations. Therapy-disease treatment relations from scientific literature are an important type in RE as they indicate emerging therapeutic hypotheses and off-label usages being explored in the community. An automatically extracted evolving knowledge-base of such relations will be of great utility to researchers because doing it manually is not viable with the exponential growth of biomedical articles. In this paper, toward this end, we introduce a new expert-annotated dataset LitTx for identifying treatment relationships discussed in literature given the lack of such datasets in the recent past. Besides confirmed or implied positive relations, we also introduce a new "conditional treatment" relation type where hedging or a potential relationship is indicated. Our baseline RE models with this new dataset demonstrate promising results, while also revealing clear areas for improvement. To foster innovation and ensure replicability in the biomedical RE community, we release our dataset, code, and annotation guidelines publicly: https://github.com/bionlproc/LitTx_dataset.

2025

pdf bib abs

Mining Social Media for Barriers to Opioid Recovery with LLMs
Vinu Ekanayake | Md Sultan Al Nahian | Ramakanth Kavuluru
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)

Opioid abuse and addiction remain a major public health challenge in the US. At a broad level, barriers to recovery often take the form of individual, social, and structural issues. However, it is crucial to know the specific barriers patients face to help design better treatment interventions and healthcare policies. Researchers typically discover barriers through focus groups and surveys. While scientists can exercise better control over these strategies, such methods are both expensive and time consuming, needing repeated studies across time as new barriers emerge. We believe, this traditional approach can be complemented by automatically mining social media to determine high-level trends in both well-known and emerging barriers. In this paper, we report on such an effort by mining messages from the r/OpiatesRecovery subreddit to extract, classify, and examine barriers to opioid recovery, with special attention to the COVID-19 pandemic’s impact. Our methods involve multi-stage prompting to arrive at barriers from each post and map them to existing barriers or identify new ones. The new barriers are refined into coherent categories using embedding-based similarity measures and hierarchical clustering. Temporal analysis shows that some stigma-related barriers declined (relative to pre-pandemic), whereas systemic obstacles—such as treatment discontinuity and exclusionary practices—rose significantly during the pandemic. Our method is general enough to be applied to barrier extraction for other substance abuse scenarios (e.g., alcohol or stimulants)

pdf bib abs

RadQA-DPO: A Radiology Question Answering System with Encoder-Decoder Models Enhanced by Direct Preference Optimization
Md Sultan Al Nahian | Ramakanth Kavuluru
Proceedings of the 24th Workshop on Biomedical Language Processing

Extractive question answering over clinical text is a crucial need to help deal with the deluge of clinical text generated in hospitals. While encoder models (e.g., BERT) have been popular for this reading comprehension–style question answering task, recently encoder-decoder models (e.g., T5) are on the rise. There is also the emergence of preference optimization techniques to align decoder-only LLMs with human preferences. In this paper, we combine encoder-decoder models with the direct preference optimization (DPO) method for the RadQA radiology question answering task. Our approach achieves a 12–15 F1 point improvement over previous state-of-the-art models. To the best of our knowledge, this effort is the first to show that DPO method also works for reading comprehension via novel heuristics to generate preference data without human inputs.

2024

pdf bib abs

UKYNLP@SMM4H2024: Language Model Methods for Health Entity Tagging and Classification on Social Media (Tasks 4 & 5)
Motasem Obeidat | Vinu Ekanayake | Md Sultan Al Nahian | Ramakanth Kavuluru
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

We describe the methods and results of our submission to the 9th Social Media Mining for Health Research and Applications (SMM4H) 2024 shared tasks 4 and 5. Task 4 involved extracting the clinical and social impacts of non-medical substance use and task 5 focused on the binary classification of tweets reporting children’s medical disorders. We employed encoder language models and their ensembles, achieving the top score on task 4 and a high score for task 5.

Co-authors

Li Hao Richie Xu 1

Venues

Fix author