Lu Dai


2025

pdf bib
MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Correction
Yuyang Wu | Jinhui Ye | Shuhao Zhang | Lu Dai | Yonatan Bisk | Olexandr Isayev
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) have shown growing potential in molecular sciences, but they often produce chemically inaccurate descriptions and struggle to recognize or justify potential errors. This raises important concerns about their robustness and reliability in scientific applications. To support more rigorous evaluation of LLMs in chemical reasoning, we present the MolErr2Fix benchmark, designed to assess LLMs on error detection and correction in molecular descriptions. Unlike existing benchmarks focused on molecule-to-text generation or property prediction, MolErr2Fix emphasizes fine-grained chemical understanding. It tasks LLMs with identifying, localizing, explaining, and revising potential structural and semantic errors in molecular descriptions. Specifically, MolErr2Fix consists of 1,193 fine-grained annotated error instances. Each instance contains quadruple annotations, i.e., (error type, span location, the explanation, and the correction). These tasks are intended to reflect the types of reasoning and verification required in real-world chemical communication. Evaluations of current state-of-the-art LLMs reveal notable performance gaps, underscoring the need for more robust chemical reasoning capabilities. MolErr2Fix provides a focused benchmark for evaluating such capabilities and aims to support progress toward more reliable and chemically informed language models. All annotations and an accompanying evaluation API will be publicly released to facilitate future research.

2024

pdf bib
Improve Dense Passage Retrieval with Entailment Tuning
Lu Dai | Hao Liu | Hui Xiong
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Retrieval module can be plugged into many downstream NLP tasks to improve their performance, such as open-domain question answering and retrieval-augmented generation. The key to a retrieval system is to calculate relevance scores to query and passage pairs. However, the definition of relevance is often ambiguous. We observed that a major class of relevance aligns with the concept of entailment in NLI tasks. Based on this observation, we designed a method called entailment tuning to improve the embedding of dense retrievers. Specifically, we unify the form of retrieval data and NLI data using existence claim as a bridge. Then, we train retrievers to predict the claims entailed in a passage with a variant task of masked prediction. Our method can be efficiently plugged into current dense retrieval methods, and experiments show the effectiveness of our method.

2022

pdf bib
ConnPrompt: Connective-cloze Prompt Learning for Implicit Discourse Relation Recognition
Wei Xiang | Zhenglin Wang | Lu Dai | Bang Wang
Proceedings of the 29th International Conference on Computational Linguistics

Implicit Discourse Relation Recognition (IDRR) is to detect and classify relation sense between two text segments without an explicit connective. Vanilla pre-train and fine-tuning paradigm builds upon a Pre-trained Language Model (PLM) with a task-specific neural network. However, the task objective functions are often not in accordance with that of the PLM. Furthermore, this paradigm cannot well exploit some linguistic evidence embedded in the pre-training process. The recent pre-train, prompt, and predict paradigm selects appropriate prompts to reformulate downstream tasks, so as to utilizing the PLM itself for prediction. However, for its success applications, prompts, verbalizer as well as model training should still be carefully designed for different tasks. As the first trial of using this new paradigm for IDRR, this paper develops a Connective-cloze Prompt (ConnPrompt) to transform the relation prediction task as a connective-cloze task. Specifically, we design two styles of ConnPrompt template: Insert-cloze Prompt (ICP) and Prefix-cloze Prompt (PCP) and construct an answer space mapping to the relation senses based on the hierarchy sense tags and implicit connectives. Furthermore, we use a multi-prompt ensemble to fuse predictions from different prompting results. Experiments on the PDTB corpus show that our method significantly outperforms the state-of-the-art algorithms, even with fewer training data.

pdf bib
Bi-Directional Iterative Prompt-Tuning for Event Argument Extraction
Lu Dai | Bang Wang | Wei Xiang | Yijun Mo
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recently, prompt-tuning has attracted growing interests in event argument extraction (EAE). However, the existing prompt-tuning methods have not achieved satisfactory performance due to the lack of consideration of entity information. In this paper, we propose a bi-directional iterative prompt-tuning method for EAE, where the EAE task is treated as a cloze-style task to take full advantage of entity information and pre-trained language models (PLMs). Furthermore, our method explores event argument interactions by introducing the argument roles of contextual entities into prompt construction. Since template and verbalizer are two crucial components in a cloze-style prompt, we propose to utilize the role label semantic knowledge to construct a semantic verbalizer and design three kind of templates for the EAE task. Experiments on the ACE 2005 English dataset with standard and low-resource settings show that the proposed method significantly outperforms the peer state-of-the-art methods.

pdf bib
Encoding and Fusing Semantic Connection and Linguistic Evidence for Implicit Discourse Relation Recognition
Wei Xiang | Bang Wang | Lu Dai | Yijun Mo
Findings of the Association for Computational Linguistics: ACL 2022

Prior studies use one attention mechanism to improve contextual semantic representation learning for implicit discourse relation recognition (IDRR). However, diverse relation senses may benefit from different attention mechanisms. We also argue that some linguistic relation in between two words can be further exploited for IDRR. This paper proposes a Multi-Attentive Neural Fusion (MANF) model to encode and fuse both semantic connection and linguistic evidence for IDRR. In MANF, we design a Dual Attention Network (DAN) to learn and fuse two kinds of attentive representation for arguments as its semantic connection. We also propose an Offset Matrix Network (OMN) to encode the linguistic relations of word-pairs as linguistic evidence. Our MANF model achieves the state-of-the-art results on the PDTB 3.0 corpus.