Anna Kiepura
2025
Literature discovery with natural language queries
Anna Kiepura
|
Jessica Lam
|
Nianlong Gu
|
Richard Hahnloser
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Literature discovery is a critical component of scientific research. Modern discovery systems leveraging Large Language Models (LLMs) are increasingly adopted for their ability to process natural language queries (NLQs). To assess the robustness of such systems, we compile two NLQ datasets and submit them to nine widely used discovery platforms. Our findings reveal that LLM-based search engines struggle with precisely formulated queries, often producing numerous false positives. However, precision improves when LLMs are used not for direct retrieval but to convert NLQs into structured keyword-based queries. As a result, hybrid systems that integrate both LLM-driven and keyword-based approaches outperform purely keyword-based or purely LLM-based discovery methods.
ClimateCheck2025: Multi-Stage Retrieval Meets LLMs for Automated Scientfic Fact-Checking
Anna Kiepura
|
Jessica Lam
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Misinformation on social media poses significant risks, particularly when it concerns critical scientific issues such as climate change. One promising direction for mitigation is the development of automated fact-checking systems that verify claims against authoritative scientific sources. In this work, we present our solution to the ClimateCheck2025 shared task, which involves retrieving and classifying scientific abstracts as evidence for or against given claims. Our system is built around a multi-stage hybrid retrieval pipeline that integrates lexical, sparse neural, and dense neural retrievers, followed by cross-encoder and large language model (LLM)-based reranking stages. For stance classification, we employ prompting strategies with LLMs to determine whether a retrieved abstract supports, refutes, or provides no evidence for a given claim. Our approach achieves the second-highest overall score across both subtasks of the benchmark and significantly surpasses the official baseline by 53.79% on average across Recall@2, Recall@5, Recall@10, and B-Pref. Notably, we achieve state-of-the-art performance in Recall@2. These results highlight the effectiveness of combining structured retrieval architectures with the emergent reasoning capabilities of LLMs for scientific fact verification, especially in domains where reliable human annotation is scarce and timely intervention is essential.
2024
SciPara: A New Dataset for Investigating Paragraph Discourse Structure in Scientific Papers
Anna Kiepura
|
Yingqiang Gao
|
Jessica Lam
|
Nianlong Gu
|
Richard H.r. Hahnloser
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)
Good scientific writing makes use of specific sentence and paragraph structures, providing a rich platform for discourse analysis and developing tools to enhance text readability. In this vein, we introduce SciPara, a novel dataset consisting of 981 scientific paragraphs annotated by experts in terms of sentence discourse types and topic information. On this dataset, we explored two tasks: 1) discourse category classification, which is to predict the discourse category of a sentence by using its paragraph and surrounding paragraphs as context, and 2) discourse sentence generation, which is to generate a sentence of a certain discourse category by using various contexts as input. We found that Pre-trained Language Models (PLMs) can accurately identify Topic Sentences in SciPara, but have difficulty distinguishing Concluding, Transition, and Supporting Sentences. The quality of the sentences generated by all investigated PLMs improved with amount of context, regardless of discourse category. However, not all contexts were equally influential. Contrary to common assumptions about well-crafted scientific paragraphs, our analysis revealed that paradoxically, paragraphs with complete discourse structures were less readable.