Kadir Bulut Ozler
2026
Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages
Ayesha Khatun | Kadir Bulut Ozler | Steven Bethard | Egoitz Laparra
BioNLP 2026
Ayesha Khatun | Kadir Bulut Ozler | Steven Bethard | Egoitz Laparra
BioNLP 2026
This paper studies how to improve biomedical named entity recognition (NER) using large language models (LLMs), especially for low-resource languages like Bangla and Basque. The main goal is to understand how different prompt styles and output formats affect model performance. The study finds that the way we design prompts is very important. Among all methods, question-style prompting works best across all languages. It helps the model understand the biomedical task more clearly and improves accuracy. In fact, improvements are much greater in Bangla and Basque compared to high-resource languages like English and Spanish. Another key finding is about the output format. Traditional BIO tagging (labeling each word) performs poorly with LLMs because it is strict and sensitive to small errors. Instead, span-based extraction (directly extracting text phrases) works much better and gives higher F1 scores. This is because LLMs naturally generate text spans rather than token-level labels. The paper also analyzes errors. Common problems include hallucination, missing entities, and boundary mistakes. Translation-based prompts can reduce hallucination, while question-style prompts reduce empty outputs in biomedical NER. Overall, the study shows that choosing the right prompt and output format is very important, especially for low-resource high-vocabulary languages. It provides useful guidance for building better multilingual medical information extraction systems.
2023
clulab at MEDIQA-Chat 2023: Summarization and classification of medical dialogues
Kadir Bulut Ozler | Steven Bethard
Proceedings of the 5th Clinical Natural Language Processing Workshop
Kadir Bulut Ozler | Steven Bethard
Proceedings of the 5th Clinical Natural Language Processing Workshop
Clinical Natural Language Processing has been an increasingly popular research area in the NLP community. With the rise of large language models (LLMs) and their impressive abilities in NLP tasks, it is crucial to pay attention to their clinical applications. Sequence to sequence generative approaches with LLMs have been widely used in recent years. To be a part of the research in clinical NLP with recent advances in the field, we participated in task A of MEDIQA-Chat at ACL-ClinicalNLP Workshop 2023. In this paper, we explain our methods and findings as well as our comments on our results and limitations.
2020
Fine-tuning for multi-domain and multi-label uncivil language detection
Kadir Bulut Ozler | Kate Kenski | Steve Rains | Yotam Shmargad | Kevin Coe | Steven Bethard
Proceedings of the Fourth Workshop on Online Abuse and Harms
Kadir Bulut Ozler | Kate Kenski | Steve Rains | Yotam Shmargad | Kevin Coe | Steven Bethard
Proceedings of the Fourth Workshop on Online Abuse and Harms
Incivility is a problem on social media, and it comes in many forms (name-calling, vulgarity, threats, etc.) and domains (microblog posts, online news comments, Wikipedia edits, etc.). Training machine learning models to detect such incivility must handle the multi-label and multi-domain nature of the problem. We present a BERT-based model for incivility detection and propose several approaches for training it for multi-label and multi-domain datasets. We find that individual binary classifiers outperform a joint multi-label classifier, and that simply combining multiple domains of training data outperforms other recently-proposed fine tuning strategies. We also establish new state-of-the-art performance on several incivility detection datasets.