Natasa Miskov-Zivanov


2026

The exponential growth of biomedical literature has made manual curation of biological interaction networks increasingly difficult. Existing automated biological interaction extraction systems address the scaling challenge but treat extraction as a final step, delivering structured output with limited or no integrated support for biologists to interactively verify, correct and contextually interrogate extracted interactions against their source evidence within the same environment. We present Knowledge-Assisted Literature Mining for Biological Interaction Analysis (KALIMBA), an end-to-end, human-in-the-loop platform that integrates three complementary extraction methods (NLP-only, LLM-only, and hybrid) alongside expert annotation and evidence-grounded conversational querying through retrieval-augmented generation (RAG) chat module driven by a dual-context prompt, within a single unified workflow. Evaluation on a corpus of 40 signaling-focused papers demonstrates that the LLM-only back-end recovers substantially more interactions than the NLP-only approach. RAG chat evaluation by a domain expert confirms that the conversational module provides scientifically grounded responses that support curation decisions beyond what the structured interaction table alone conveys.
Large language models (LLMs) demonstrate strong general language capabilities but remain limited in chemical reasoning, particularly for tasks requiring structured, mechanistic understanding of molecular reactions. We present Knowledge Graph Reaction LLM (KGRxn-LLM), a framework that augments LLMs with a hierarchical chemical knowledge graph (KG) to ground reasoning in molecular transformations and reaction patterns. Existing benchmarks primarily emphasize reaction or molecular fact recall, providing limited assessment of reaction-level mechanistic reasoning. To address this gap, we introduce KGRxn-Bench, a benchmark of 1,200 questions designed to evaluate LLMs on reaction-centric reasoning tasks, including functional group identification, reaction type classification, and product and reagent prediction. Experimental results show that our approach of grounding LLMs in structured KG substantially improves performance across multiple tasks and model backbones, outperforming domain-specific fine-tuned models on KG-covered splits and most hold-out splits.

2021

The amount of biomedical literature has vastly increased over the past few decades. As a result, the sheer quantity of accessible information is overwhelming, and complicates manual information retrieval. Automated methods seek to speed up information retrieval from biomedical literature. However, such automated methods are still too time-intensive to survey all existing biomedical literature. We present a methodology for automatically generating literature queries that select relevant papers based on biological data. By using differentially expressed genes to inform our literature searches, we focus information extraction on mechanistic signaling details that are crucial for the disease or context of interest.