Kadir Bulut Ozler

2026

Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages
Ayesha Khatun | Kadir Bulut Ozler | Steven Bethard | Egoitz Laparra
BioNLP 2026

This paper studies how to improve biomedical named entity recognition (NER) using large language models (LLMs), especially for low-resource languages like Bangla and Basque. The main goal is to understand how different prompt styles and output formats affect model performance. The study finds that the way we design prompts is very important. Among all methods, question-style prompting works best across all languages. It helps the model understand the biomedical task more clearly and improves accuracy. In fact, improvements are much greater in Bangla and Basque compared to high-resource languages like English and Spanish. Another key finding is about the output format. Traditional BIO tagging (labeling each word) performs poorly with LLMs because it is strict and sensitive to small errors. Instead, span-based extraction (directly extracting text phrases) works much better and gives higher F1 scores. This is because LLMs naturally generate text spans rather than token-level labels. The paper also analyzes errors. Common problems include hallucination, missing entities, and boundary mistakes. Translation-based prompts can reduce hallucination, while question-style prompts reduce empty outputs in biomedical NER. Overall, the study shows that choosing the right prompt and output format is very important, especially for low-resource high-vocabulary languages. It provides useful guidance for building better multilingual medical information extraction systems.

2023

pdf bib abs

clulab at MEDIQA-Chat 2023: Summarization and classification of medical dialogues
Kadir Bulut Ozler | Steven Bethard
Proceedings of the 5th Clinical Natural Language Processing Workshop

Clinical Natural Language Processing has been an increasingly popular research area in the NLP community. With the rise of large language models (LLMs) and their impressive abilities in NLP tasks, it is crucial to pay attention to their clinical applications. Sequence to sequence generative approaches with LLMs have been widely used in recent years. To be a part of the research in clinical NLP with recent advances in the field, we participated in task A of MEDIQA-Chat at ACL-ClinicalNLP Workshop 2023. In this paper, we explain our methods and findings as well as our comments on our results and limitations.

2020

pdf bib abs

Incivility is a problem on social media, and it comes in many forms (name-calling, vulgarity, threats, etc.) and domains (microblog posts, online news comments, Wikipedia edits, etc.). Training machine learning models to detect such incivility must handle the multi-label and multi-domain nature of the problem. We present a BERT-based model for incivility detection and propose several approaches for training it for multi-label and multi-domain datasets. We find that individual binary classifiers outperform a joint multi-label classifier, and that simply combining multiple domains of training data outperforms other recently-proposed fine tuning strategies. We also establish new state-of-the-art performance on several incivility detection datasets.

Co-authors

Steve Rains 1

Yotam Shmargad 1

Venues

Fix author