Md Mushfiqur Rahman


2025

pdf bib
Evaluating Health Question Answering Under Readability-Controlled Style Perturbations
Md Mushfiqur Rahman | Kevin Lybarger
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

Patients often ask semantically similar medical questions in linguistically diverse ways that vary in readability tone and background knowledge. A robust question answering QA system should both provide semantically consistent answers across stylistic differences and adapt its response style to match the users input however existing QA evaluations rarely test this capability creating critical gaps in QA evaluation that undermine accessibility and health literacy. We introduce SPQA an evaluation framework and benchmark that applies controlled stylistic perturbations to consumer health questions while preserving semantic intent then measures how model answers change across correctness completeness coherence fluency and linguistic adaptability using a human-validated LLM-based judge. The style axes include reading level formality and patient background knowledge all perturbations are grounded in human annotations to ensure fidelity and alignment with human judgments. Our contributions include a readability-aware evaluation methodology a style-diverse benchmark with human-grounded perturbations and an automated evaluation pipeline validated against expert judgments. Evaluation results across multiple health QA models indicate that stylistic perturbations lead to measurable performance degradation even when semantic intent is preserved during perturbation. The largest performance drops occur in answer correctness and completeness while models also show limited ability to adapt their style to match the input. These findings underscore the risk of inequitable information delivery and highlight the need for accessibility-aware QA evaluation.

2023

pdf bib
Intent Detection and Slot Filling for Home Assistants: Dataset and Analysis for Bangla and Sylheti
Fardin Ahsan Sakib | A H M Rezaul Karim | Saadat Hasan Khan | Md Mushfiqur Rahman
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

As voice assistants cement their place in our technologically advanced society, there remains a need to cater to the diverse linguistic landscape, including colloquial forms of low-resource languages. Our study introduces the first-ever comprehensive dataset for intent detection and slot filling in formal Bangla, colloquial Bangla, and Sylheti languages, totaling 984 samples across 10 unique intents. Our analysis reveals the robustness of large language models for tackling downstream tasks with inadequate data. The GPT-3.5 model achieves an impressive F1 score of 0.94 in intent detection and 0.51 in slot filling for colloquial Bangla.

pdf bib
To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer
Md Mushfiqur Rahman | Fardin Ahsan Sakib | Fahim Faisal | Antonios Anastasopoulos
Proceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL)