2025
pdf
bib
abs
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Jinheon Baek
|
Sujay Kumar Jauhar
|
Silviu Cucerzan
|
Sung Ju Hwang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
The pace of scientific research, vital for improving human life, is complex, slow, and needs specialized expertise. Meanwhile, novel, impactful research often stems from both a deep understanding of prior work, and a cross-pollination of ideas across domains and fields. To enhance the productivity of researchers, we propose ResearchAgent, which leverages the encyclopedic knowledge and linguistic reasoning capabilities of Large Language Models (LLMs) to assist them in their work. This system automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them based on the feedback from collaborative LLM-powered reviewing agents. Specifically, starting with a core scientific paper, ResearchAgent is augmented not only with relevant publications by connecting information over an academic graph but also entities retrieved from a knowledge store derived from shared underlying concepts mined across numerous papers. Then, mimicking a scientific approach to improving ideas with peer discussions, we leverage multiple LLM-based ReviewingAgents that provide reviews and feedback via iterative revision processes. These reviewing agents are instantiated with human preference-aligned LLMs whose criteria for evaluation are elicited from actual human judgments via LLM prompting. We experimentally validate our ResearchAgent on scientific publications across multiple disciplines, showing its effectiveness in generating novel, clear, and valid ideas based on both human and model-based evaluation results. Our initial foray into AI-mediated scientific research has important implications for the development of future systems aimed at supporting researchers in their ideation and operationalization of novel work.
2024
pdf
bib
abs
Knowledge-Centric Templatic Views of Documents
Isabel Alyssa Cachola
|
Silviu Cucerzan
|
Allen Herring
|
Vuksan Mijovic
|
Erik Oveson
|
Sujay Kumar Jauhar
Findings of the Association for Computational Linguistics: EMNLP 2024
Authors seeking to communicate with broader audiences often share their ideas in various document formats, such as slide decks, newsletters, reports, and posters. Prior work on document generation has generally tackled the creation of each separate format to be a different task, leading to fragmented learning processes, redundancy in models and methods, and disjointed evaluation. We consider each of these documents as templatic views of the same underlying knowledge/content, and we aim to unify the generation and evaluation of these templatic views. We begin by showing that current LLMs are capable of generating various document formats with little to no supervision. Further, a simple augmentation involving a structured intermediate representation can improve performance, especially for smaller models. We then introduce a novel unified evaluation framework that can be adapted to measuring the quality of document generators for heterogeneous downstream applications. This evaluation is adaptable to a range of user defined criteria and application scenarios, obviating the need for task specific evaluation metrics. Finally, we conduct a human evaluation, which shows that people prefer 82% of the documents generated with our method, while correlating more highly with our unified evaluation framework than prior metrics in the literature.
2018
pdf
bib
abs
Multi-lingual Entity Discovery and Linking
Avi Sil
|
Heng Ji
|
Dan Roth
|
Silviu-Petru Cucerzan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
The primary goals of this tutorial are to review the framework of cross-lingual EL and motivate it as a broad paradigm for the Information Extraction task. We will start by discussing the traditional EL techniques and metrics and address questions relevant to the adequacy of these to across domains and languages. We will then present more recent approaches such as Neural EL, discuss the basic building blocks of a state-of-the-art neural EL system and analyze some of the current results on English EL. We will then proceed to Cross-lingual EL and discuss methods that work across languages. In particular, we will discuss and compare multiple methods that make use of multi-lingual word embeddings. We will also present EL methods that work for both name tagging and linking in very low resource languages. Finally, we will discuss the uses of cross-lingual EL in a variety of applications like search engines and commercial product selling applications. Also, contrary to the 2014 EL tutorial, we will also focus on Entity Discovery which is an essential component of EL.
2014
pdf
bib
Towards Temporal Scoping of Relational Facts based on Wikipedia Data
Avirup Sil
|
Silviu-Petru Cucerzan
Proceedings of the Eighteenth Conference on Computational Natural Language Learning
2008
pdf
bib
Augmenting Wikipedia with Named Entity Tags
Wisam Dakka
|
Silviu Cucerzan
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I
2007
pdf
bib
Large-Scale Named Entity Disambiguation Based on Wikipedia Data
Silviu Cucerzan
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
2004
pdf
bib
Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users
Silviu Cucerzan
|
Eric Brill
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
2003
pdf
bib
Minimally Supervised Induction of Grammatical Gender
Silviu Cucerzan
|
David Yarowsky
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics
2002
pdf
bib
Augmented Mixture Models for Lexical Disambiguation
Silviu Cucerzan
|
David Yarowsky
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)
pdf
bib
Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day
Silviu Cucerzan
|
David Yarowsky
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)
pdf
bib
Language Independent NER using a Unified Model of Internal and Contextual Evidence
Silviu Cucerzan
|
David Yarowsky
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)
2001
pdf
bib
The John Hopkins SENSEVAL-2 System Descriptions
David Yarowsky
|
Silviu Cucerzan
|
Radu Florian
|
Charles Schafer
|
Richard Wicentowski
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems
2000
pdf
bib
Language Independent, Minimally Supervised Induction of Lexical Probabilities
Silviu Cucerzan
|
David Yarowsky
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics
1999
pdf
bib
Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence
Silviu Cucerzan
|
David Yarowsky
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora