Naeemul Hassan


2025

pdf bib
LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content
Mohamed Bayan Kmainasi | Ali Ezzat Shahroor | Maram Hasanain | Sahinur Rahman Laskar | Naeemul Hassan | Firoj Alam
Findings of the Association for Computational Linguistics: NAACL 2025

Large Language Models (LLMs) have demonstrated remarkable success as general-purpose task solvers across various fields. However, their capabilities remain limited when addressing domain-specific problems, particularly in downstream NLP tasks. Research has shown that models fine-tuned on instruction-based downstream NLP datasets outperform those that are not fine-tuned. While most efforts in this area have primarily focused on resource-rich languages like English and broad domains, little attention has been given to multilingual settings and specific domains. To address this gap, this study focuses on developing a specialized LLM, LlamaLens, for analyzing news and social media content in a multilingual context. To the best of our knowledge, this is the first attempt to tackle both domain specificity and multilinguality, with a particular focus on news and social media. Our experimental setup includes 18 tasks, represented by 52 datasets covering Arabic, English, and Hindi. We demonstrate that LlamaLens outperforms the current state-of-the-art (SOTA) on 23 testing sets, and achieves comparable performance on 8 sets. We make the models and resources publicly available for the research community (https://huggingface.co/QCRI).

2023

pdf bib
Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines
Yoo Yeon Sung | Jordan Boyd-Graber | Naeemul Hassan
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Polarization and the marketplace for impressions have conspired to make navigating information online difficult for users, and while there has been a significant effort to detect false or misleading text, multimodal datasets have received considerably less attention. To complement existing resources, we present multimodal Video Misleading Headline (VMH), a dataset that consists of videos and whether annotators believe the headline is representative of the video’s contents. After collecting and annotating this dataset, we analyze multimodal baselines for detecting misleading headlines. Our annotation process also focuses on why annotators view a video as misleading, allowing us to better understand the interplay of annotators’ background and the content of the videos.

2022

pdf bib
A Survey of Computational Framing Analysis Approaches
Mohammad Ali | Naeemul Hassan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Framing analysis is predominantly qualitative and quantitative, examining a small dataset with manual coding. Easy access to digital data in the last two decades prompts scholars in both computation and social sciences to utilize various computational methods to explore frames in large-scale datasets. The growing scholarship, however, lacks a comprehensive understanding and resources of computational framing analysis methods. Aiming to address the gap, this article surveys existing computational framing analysis approaches and puts them together. The research is expected to help scholars and journalists gain a deeper understanding of how frames are being explored computationally, better equip them to analyze frames in large-scale datasets, and, finally, work on advancing methodological approaches.