Shubo Tian
2026
Learning to Combine AI Annotations for Improved Biomedical Relevance Labeling
Won Gyu Kim | Lana Yeganova | Shubo Tian | Donald Comeau | W John Wilbur | Zhiyong Lu
BioNLP 2026
Won Gyu Kim | Lana Yeganova | Shubo Tian | Donald Comeau | W John Wilbur | Zhiyong Lu
BioNLP 2026
Accurate labeling of relevance between biomedical abstracts is essential for improving information retrieval, semantic similarity modeling, training of ranking systems and other Natural Language Processing tasks. However, manual annotations are time-consuming, labor intensive and costly. Studies show that large language models (LLMs) can facilitate automated annotation, but their performance still falls short of human expert-level accuracy, especially in domain-specific tasks. It has been shown that combining annotations from multiple non-expert annotators can achieve performance comparable to, or even exceeding, that of trained experts. Based on this evidence, we treat AI-generated annotations as contributions from non-expert annotators and combine them using Learning to Rank framework. Our results show significant improvement in overall annotation quality. The proposed method looks promising to reduce reliance on human annotation while maintaining reliable performance for large-scale biomedical applications.
BioTopicXplor: A Web Tool for Interactive Exploration of PubMed Literature through Reproducible Topics.
Lana Yeganova | Donald Comeau | Won Kim | Natalie Xie | Shubo Tian | W John Wilbur | Zhiyong Lu
BioNLP 2026
Lana Yeganova | Donald Comeau | Won Kim | Natalie Xie | Shubo Tian | W John Wilbur | Zhiyong Lu
BioNLP 2026
The rapid growth of biomedical literature presents a major challenge for organizing knowledge and identifying emerging research trends. While PubMed provides effective access to relevant articles, it does not support understanding the conceptual structure of document collections. Existing tools rely on predefined features, ontologies, or parameter-sensitive clustering methods, limiting their ability to uncover fine-grained, data-driven topics in a reproducible manner. We present BioTopicXplor, an on-demand web server for interactive exploration of biomedical literature derived from arbitrary PubMed queries. The system integrates ConvexTopics, a convex optimization?based topic modeling framework that guarantees convergence to a global optimum and eliminates the need for predefined parameters. This enables the generation of reproducible and fine-grained topic structures across large document collections. Given a PubMed query, BioTopicXplor retrieves relevant articles, performs topic discovery, and organizes the resulting subtopics into a hierarchical structure of higher-level themes. To enhance interpretability, the system incorporates large language models to generate concise, literature-grounded summaries and descriptive titles for each topic, with links to supporting evidence. We demonstrate the utility of BioTopicXplor through a case study on anti-aging research, where the system reveals meaningful thematic structures and supports knowledge discovery.