Yaochu Jin
2025
NOVA: An Iterative Planning Framework for Enhancing Scientific Innovation with Large Language Models
Xiang Hu
|
Hongyu Fu
|
Jinge Wang
|
Yifeng Wang
|
Zhikun Li
|
Renjun Xu
|
Yu Lu
|
Yaochu Jin
|
Lili Pan
|
Zhenzhong Lan
Findings of the Association for Computational Linguistics: ACL 2025
Scientific innovation is pivotal for humanity, and harnessing large language models (LLMs) to generate research ideas could transform discovery. However, existing LLMs often produce simplistic and repetitive suggestions due to their limited ability in acquiring external knowledge for innovation. To address this problem, we introduce an enhanced planning and search methodology designed to boost the creative potential of LLM-based systems. Our approach involves an iterative process to purposely plan the retrieval of external knowledge, progressively enriching the idea generation with broader and deeper insights. Validation through automated and human assessments demonstrates that our framework substantially elevates the quality of generated ideas, particularly in novelty and diversity. The number of unique novel ideas produced by our framework is 3.4 times higher than without it. Moreover, our method outperforms the current state-of-the-art, generating at least 2.5 times more top-rated ideas based on 170 seed papers in a Swiss Tournament evaluation. Our code is available at https://github.com/hflyzju/Nova
2024
Federated Document-Level Biomedical Relation Extraction with Localized Context Contrast
Yan Xiao
|
Yaochu Jin
|
Kuangrong Hao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Existing studies on relation extraction focus at the document level in a centralized training environment, requiring the collection of documents from various sources. However, this raises concerns about privacy protection, especially in sensitive domains such as finance and healthcare. For the first time, this work extends document-level relation extraction to a federated environment. The proposed federated framework, called FedLCC, is tailored for biomedical relation extraction that enables collaborative training without sharing raw medical texts. To fully exploit the models of all participating clients and improve the local training on individual clients, we propose a novel concept of localized context contrast on the basis of contrastive learning. By comparing and rectifying the similarity of localized context in documents between clients and the central server, the global model can better represent the documents on individual clients. Due to the lack of a widely accepted measure of non-IID text data, we introduce a novel non-IID scenario based on graph structural entropy. Experimental results on three document-level biomedical relation extraction datasets demonstrate the effectiveness of our method. Our code is available at https://github.com/xxxxyan/FedLCC.
Search
Fix author
Co-authors
- Hongyu Fu 1
- Kuangrong Hao 1
- Xiang Hu 1
- Zhenzhong Lan 1
- Zhikun Li 1
- show all...