Sean O Brien
Also published as: Sean O'Brien
2025
Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models
Atharva Bhargude
|
Ishan Gonehal
|
Dave Yoon
|
Sean O Brien
|
Kaustubh Vinnakota
|
Chandler Haney
|
Aaron Sandoval
|
Kevin Zhu
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)
Phishing attacks represent a significant cybersecurity threat, necessitating adaptive detection techniques. This study explores few-shot Adaptive Linguistic Prompting (ALP) in detecting phishing webpages through the multimodal capabilities of state-of-the-art large language models (LLMs) such as GPT-4o and Gemini 1.5 Pro. ALP is a structured semantic reasoning method that guides LLMs to analyze textual deception by breaking down linguistic patterns, detecting urgency cues, and identifying manipulative diction commonly found in phishing content. By integrating textual, visual, and URL-based analysis, we propose a unified model capable of identifying sophisticated phishing attempts. Our experiments demonstrate that ALP significantly enhances phishing detection accuracy by guiding LLMs through structured reasoning and contextual analysis. The findings highlight the potential of ALP-integrated multimodal LLMs to advance phishing detection frameworks, achieving an F1-score of 0.93—surpassing traditional approaches. These results establish a foundation for more robust, interpretable, and adaptive linguistic-based phishing detection systems using LLMs.
2024
DiversityMedQA: A Benchmark for Assessing Demographic Biases in Medical Diagnosis using Large Language Models
Rajat Rawat
|
Hudson McBride
|
Rajarshi Ghosh
|
Dhiyaan Nirmal
|
Jong Moon
|
Dhruv Alamuri
|
Sean O'Brien
|
Kevin Zhu
Proceedings of the Third Workshop on NLP for Positive Impact
As large language models (LLMs) gain traction in healthcare, concerns about their susceptibility to demographic biases are growing. We introduce DiversityMedQA, a novel benchmark designed to assess LLM responses to medical queries across diverse patient demographics, such as gender and ethnicity. By perturbing questions from the MedQA dataset, which comprises of medical board exam questions, we created a benchmark that captures the nuanced differences in medical diagnosis across varying patient profiles. To ensure that our perturbations did not alter the clinical outcomes, we implemented a filtering strategy to validate each perturbation, so that any performance discrepancies would be indicative of bias. Our findings reveal notable discrepancies in model performance when tested against these demographic variations. By releasing DiversityMedQA, we provide a resource for evaluating and mitigating demographic bias in LLM medical diagnoses.
Search
Fix author
Co-authors
- Kevin Zhu 2
- Dhruv Alamuri 1
- Atharva Bhargude 1
- Rajarshi Ghosh 1
- Ishan Gonehal 1
- show all...