Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages
Ayesha Khatun, Kadir Bulut Ozler, Steven Bethard, Egoitz Laparra
Abstract
This paper studies how to improve biomedical named entity recognition (NER) using large language models (LLMs), especially for low-resource languages like Bangla and Basque. The main goal is to understand how different prompt styles and output formats affect model performance. The study finds that the way we design prompts is very important. Among all methods, question-style prompting works best across all languages. It helps the model understand the biomedical task more clearly and improves accuracy. In fact, improvements are much greater in Bangla and Basque compared to high-resource languages like English and Spanish. Another key finding is about the output format. Traditional BIO tagging (labeling each word) performs poorly with LLMs because it is strict and sensitive to small errors. Instead, span-based extraction (directly extracting text phrases) works much better and gives higher F1 scores. This is because LLMs naturally generate text spans rather than token-level labels. The paper also analyzes errors. Common problems include hallucination, missing entities, and boundary mistakes. Translation-based prompts can reduce hallucination, while question-style prompts reduce empty outputs in biomedical NER. Overall, the study shows that choosing the right prompt and output format is very important, especially for low-resource high-vocabulary languages. It provides useful guidance for building better multilingual medical information extraction systems.- Anthology ID:
- 2026.bionlp-1.4
- Volume:
- BioNLP 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California
- Editors:
- Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
- Venues:
- BioNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 31–44
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.4/
- DOI:
- Cite (ACL):
- Ayesha Khatun, Kadir Bulut Ozler, Steven Bethard, and Egoitz Laparra. 2026. Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages. In BioNLP 2026, pages 31–44, San Diego, California. Association for Computational Linguistics.
- Cite (Informal):
- Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages (Khatun et al., BioNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.4.pdf