Kevin Bryson
2025
Can We Edit LLMs for Long-Tail Biomedical Knowledge?
Xinhao Yi
|
Jake Lever
|
Kevin Bryson
|
Zaiqiao Meng
Findings of the Association for Computational Linguistics: EMNLP 2025
Knowledge editing has emerged as an effective approach for updating large language models (LLMs) by modifying their internal knowledge. However, their application to the biomedical domain faces unique challenges due to the long-tailed distribution of biomedical knowledge, where rare and infrequent information is prevalent. In this paper, we conduct the first comprehensive study to investigate the effectiveness of knowledge editing methods for editing long-tail biomedical knowledge. Our results indicate that, while existing editing methods can enhance LLMs’ performance on long-tail biomedical knowledge, their performance on long-tail knowledge remains inferior to that on high-frequency popular knowledge, even after editing. Our further analysis reveals that long-tail biomedical knowledge contains a significant amount of one-to-many knowledge, where one subject and relation link to multiple objects. This high prevalence of one-to-many knowledge limits the effectiveness of knowledge editing in improving LLMs’ understanding of long-tail biomedical knowledge, highlighting the need for tailored strategies to bridge this performance gap.
2022
Explaining Why: How Instructions and User Interfaces Impact Annotator Rationales When Labeling Text Data
Jamar Sullivan Jr.
|
Will Brackenbury
|
Andrew McNutt
|
Kevin Bryson
|
Kwam Byll
|
Yuxin Chen
|
Michael Littman
|
Chenhao Tan
|
Blase Ur
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
In the context of data labeling, NLP researchers are increasingly interested in having humans select rationales, a subset of input tokens relevant to the chosen label. We conducted a 332-participant online user study to understand how humans select rationales, especially how different instructions and user interface affordances impact the rationales chosen. Participants labeled ten movie reviews as positive or negative, selecting words and phrases supporting their label as rationales. We varied the instructions given, the rationale-selection task, and the user interface. Participants often selected about 12% of input tokens as rationales, but selected fewer if unable to drag over multiple tokens at once. Whereas participants were near unanimous in their data labels, they were far less consistent in their rationales. The user interface affordances and task greatly impacted the types of rationales chosen. We also observed large variance across participants.
Search
Fix author
Co-authors
- Will Brackenbury 1
- Kwam Byll 1
- Yuxin Chen 1
- Jake Lever 1
- Michael Littman 1
- show all...