This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
XinChen
Also published as:
鑫 陈
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Large Language Models (LLMs) have significantly transformed our daily life and established a new paradigm in natural language processing (NLP). However, the predominant pretraining of LLMs on extensive web-based texts remains insufficient for advanced scientific discovery, particularly in chemistry. The scarcity of specialized chemistry data, coupled with the complexity of multi-modal data such as 2D graph, 3D structure and spectrum, present distinct challenges. Although several studies have reviewed Pretrained Language Models (PLMs) in chemistry, there is a conspicuous absence of a systematic survey specifically focused on chemistry-oriented LLMs. In this paper, we outline methodologies for incorporating domain-specific chemistry knowledge and multi-modal information into LLMs, we also conceptualize chemistry LLMs as agents using chemistry tools and investigate their potential to accelerate scientific research. Additionally, we conclude the existing benchmarks to evaluate chemistry ability of LLMs. Finally, we critically examine the current challenges and identify promising directions for future research. Through this comprehensive survey, we aim to assist researchers in staying at the forefront of developments in chemistry LLMs and to inspire innovative applications in the field.
As large language models (LLMs) often generate plausible but incorrect content, error detection has become increasingly critical to ensure truthfulness.However, existing detection methods often overlook a critical problem we term as **self-consistent error**, where LLMs repeatedly generate the same incorrect response across multiple stochastic samples.This work formally defines self-consistent errors and evaluates mainstream detection methods on them.Our investigation reveals two key findings: (1) Unlike inconsistent errors, whose frequency diminishes significantly as the LLM scale increases, the frequency of self-consistent errors remains stable or even increases.(2) All four types of detection methods significantly struggle to detect self-consistent errors.These findings reveal critical limitations in current detection methods and underscore the need for improvement.Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective cross‐model probe method that fuses hidden state evidence from an external verifier LLM.Our method significantly enhances performance on self-consistent errors across three LLM families.
Most recommender systems focus on short-term objectives such as click-through rate, often at the expense of long-term user satisfaction. This can lead to echo chambers, where users are repeatedly exposed to redundant content. While recent efforts integrate Large Language Models (LLMs) into recommendation, they typically inherit this short-sighted focus. In this work, we highlight unpaired feedback—implicit signals such as continued engagement (positive) or silent disengagement (negative) that lack explicit contrastive labels—as a key challenge for long-term recommendation. Effectively learning from such feedback is crucial for improving LLM-based recommenders in dynamic user environments. To this end, we propose ULRec (Unpaired Feedback for Long-Term LLM-based Recommendation Tuning), a simple framework that fine-tunes LLMs using both positive and negative unpaired feedback. ULRec leverages the KTO algorithm to incorporate these signals without requiring paired supervision. Despite its simplicity, ULRec consistently improves long-term recommendation performance, demonstrating the value of modeling unpaired user feedback.
Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step (DSS), a novel method utilizing chain-of-thought (CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Codes are available at https://github.com/xinchen9/cot_distillation_ACL2024.
Using the pre-trained language models to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to solve programming language related problems directly. For instance, the shift of domain knowledge between natural language (NL) and programming language (PL) requires understanding the semantic and syntactic information from the data from different perspectives. To this end, we propose the AstBERT model, a pre-trained PL model aiming to better understand the financial codes using the abstract syntax tree (AST). Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three different downstream tasks.