Md Mosaddek Khan
2025
Who Holds the Pen? Caricature and Perspective in LLM Retellings of History
Lubna Zahan Lamia
|
Mabsur Fatin Bin Hossain
|
Md Mosaddek Khan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) are no longer just language generators—they are increasingly used to simulate human behavior, perspectives, and demographic variation across social domains, from public opinion surveys to experimental research. Amid this shift, the use of LLMs to simulate historical narratives has emerged as a timely frontier. It is crucial to scrutinize the asymmetries these models embed when framing, interpreting, and retelling the past. Building on prior work that defines caricature as the combination of individuation and exaggeration, we analyze LLM-generated responses across 197 historically significant events—each featuring a directly and an indirectly affected persona. We find that LLMs reliably distinguish persona-based responses from neutral baselines, and that directly affected personas consistently exhibit higher exaggeration—amplifying identity-specific portrayals. Beyond lexical patterns, personas often frame the same event in conflicting ways—especially in military, political, and morally charged contexts. Grammatical analysis further reveals that direct personas adopt more passive constructions in institutional contexts, but shift to active framing when emotional immediacy is foregrounded. Our findings show how subtle asymmetries in tone, stance, and emphasis—not overt toxicity—can quietly, yet systematically, distort how history is told and remembered.
GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer
Md. Mahmudul Hasan
|
Ahmed Nesar Tahsin Choudhury
|
Mahmudul Hasan
|
Md Mosaddek Khan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Despite Bengali being the sixth most spoken language in the world, handwritten text recognition (HTR) systems for Bengali remain severely underdeveloped. The complexity of Bengali script—featuring conjuncts, diacritics, and highly variable handwriting styles—combined with a scarcity of annotated datasets makes this task particularly challenging. We present **GraDeT-HTR**, a resource-efficient Bengali handwritten text recognition system based on a **Gra**pheme-aware **De**coder-only **T**ransformer architecture. To address the unique challenges of Bengali script, we augment the performance of a decoder-only transformer by integrating a grapheme-based tokenizer and demonstrate that it significantly improves recognition accuracy compared to conventional subword tokenizers. Our model is pretrained on large-scale synthetic data and fine-tuned on real human-annotated samples, achieving state-of-the-art performance on multiple benchmark datasets.