Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages

Ayesha Khatun, Kadir Bulut Ozler, Steven Bethard, Egoitz Laparra


Abstract
This paper studies how to improve biomedical named entity recognition (NER) using large language models (LLMs), especially for low-resource languages like Bangla and Basque. The main goal is to understand how different prompt styles and output formats affect model performance. The study finds that the way we design prompts is very important. Among all methods, question-style prompting works best across all languages. It helps the model understand the biomedical task more clearly and improves accuracy. In fact, improvements are much greater in Bangla and Basque compared to high-resource languages like English and Spanish. Another key finding is about the output format. Traditional BIO tagging (labeling each word) performs poorly with LLMs because it is strict and sensitive to small errors. Instead, span-based extraction (directly extracting text phrases) works much better and gives higher F1 scores. This is because LLMs naturally generate text spans rather than token-level labels. The paper also analyzes errors. Common problems include hallucination, missing entities, and boundary mistakes. Translation-based prompts can reduce hallucination, while question-style prompts reduce empty outputs in biomedical NER. Overall, the study shows that choosing the right prompt and output format is very important, especially for low-resource high-vocabulary languages. It provides useful guidance for building better multilingual medical information extraction systems.
Anthology ID:
2026.bionlp-1.4
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31–44
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.4/
DOI:
Bibkey:
Cite (ACL):
Ayesha Khatun, Kadir Bulut Ozler, Steven Bethard, and Egoitz Laparra. 2026. Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages. In BioNLP 2026, pages 31–44, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
Analyzing Prompt Design Choices in Biomedical Information Extraction for Low-Resource Languages (Khatun et al., BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.4.pdf