Botao Yu


2025

pdf bib
Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving
Botao Yu | Frazier N. Baker | Ziru Chen | Garrett Herb | Boyu Gou | Daniel Adu-Ampratwum | Xia Ning | Huan Sun
Findings of the Association for Computational Linguistics: NAACL 2025

To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents’ ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.

2021

pdf bib
Knowing False Negatives: An Adversarial Training Method for Distantly Supervised Relation Extraction
Kailong Hao | Botao Yu | Wei Hu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Distantly supervised relation extraction (RE) automatically aligns unstructured text with relation instances in a knowledge base (KB). Due to the incompleteness of current KBs, sentences implying certain relations may be annotated as N/A instances, which causes the so-called false negative (FN) problem. Current RE methods usually overlook this problem, inducing improper biases in both training and testing procedures. To address this issue, we propose a two-stage approach. First, it finds out possible FN samples by heuristically leveraging the memory mechanism of deep neural networks. Then, it aligns those unlabeled data with the training data into a unified feature space by adversarial training to assign pseudo labels and further utilize the information contained in them. Experiments on two wildly-used benchmark datasets demonstrate the effectiveness of our approach.