@inproceedings{lin-etal-2025-evaluating,
    title = "Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications",
    author = "Lin, Ethan  and
      Peng, Zhiyuan  and
      Fang, Yi",
    editor = "Jansen, Peter  and
      Dalvi Mishra, Bhavana  and
      Trivedi, Harsh  and
      Prasad Majumder, Bodhisattwa  and
      Hope, Tom  and
      Khot, Tushar  and
      Downey, Doug  and
      Horvitz, Eric",
    booktitle = "Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities",
    month = may,
    year = "2025",
    address = "Albuquerque, New Mexico, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.aisd-main.5/",
    doi = "10.18653/v1/2025.aisd-main.5",
    pages = "46--57",
    ISBN = "979-8-89176-224-4",
    abstract = "Recent studies have evaluated creativity, where novelty is an important aspect, of large language models (LLMs) primarily from a semantic perspective, using benchmarks from cognitive science. However, assessing the novelty in scholarly publications, a critical facet of evaluating LLMs as scientific discovery assistants, remains underexplored, despite its potential to accelerate research cycles and prioritize high-impact contributions in scientific workflows. We introduce SchNovel, a benchmark to evaluate LLMs' ability to assess novelty in scholarly papers, a task central to streamlining discovery pipeline. SchNovel consists of 15000 pairs of papers across six fields sampled from the arXiv dataset with publication dates spanning 2 to 10 years apart. In each pair, the more recently published paper is assumed to be more novel. Additionally, we propose RAG-Novelty, a retrieval-augmented method that mirrors human peer review by grounding novelty assessment in retrieved context. Extensive experiments provide insights into the capabilities of different LLMs to assess novelty and demonstrate that RAG-Novelty outperforms recent baseline models highlight LLMs' promise as tools for automating novelty detection in scientific workflows."
}Markdown (Informal)
[Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications](https://preview.aclanthology.org/ingest-emnlp/2025.aisd-main.5/) (Lin et al., AISD 2025)
ACL