Pierre Achkar
2025
From Keyterms to Context: Exploring Topic Description Generation in Scientific Corpora
Pierre Achkar
|
Satiyabooshan Murugaboopathy
|
Anne Kreuter
|
Tim Gollub
|
Martin Potthast
|
Yuri Campbell
Proceedings of The 5th New Frontiers in Summarization Workshop
Topic models represent topics as ranked term lists, which are often hard to interpret in scientific domains. We explore Topic Description for Scientific Corpora, an approach to generating structured summaries for topic-specific document sets. We propose and investigate two LLM-based pipelines: Selective Context Summarisation (SCS), which uses maximum marginal relevance to select representative documents; and Compressed Context Summarisation (CCS), a hierarchical approach that compresses document sets through iterative summarisation. We evaluate both methods using SUPERT and multi-model LLM-as-a-Judge across three topic modeling backbones and three scientific corpora. Our preliminary results suggest that SCS tends to outperform CCS in quality and robustness, while CCS shows potential advantages on larger topics. Our findings highlight interesting trade-offs between selective and compressed strategies for topic-level summarisation in scientific domains. We release code and data for two of the three datasets.