Jinkun Chen

Unverified author pages with similar names: Jinkun Chen

2026

Sentiment classification is a crucial task in natural language processing (NLP). To mitigate the spurious correlation, the causal word identification method estimates the impact of treatment words on sentence sentiment and removes those with low treatment effects. However, previous works regard the presence or absence of a specific word in a sentence as a binary treatment. This approach limits the generalizability to novel words and the robustness of low-frequency words. To bridge this gap, we propose a meta-causal approach that achieves causal word identification for arbitrary words with a single training task. Specifically, we begin by clustering contexts based on their embeddings obtained from a pre-trained language model. Subsequently, for each cluster, a representation and multi-head prediction networks are trained to estimate the treatment effect of each word to distinguish causally related words from spuriously correlated ones. The trained word classifier is then used to give weights for different words to train a more robust and generalizable sentiment classification model. Extensive experiments on public datasets demonstrate the effectiveness of our method in identifying causal words and improving the performance of sentiment classification.

2024

pdf bib abs

Long-form evaluation of model editing
Domenic Rosati | Robie Gonzales | Jinkun Chen | Xuemin Yu | Yahya Kayani | Frank Rudzicz | Hassan Sajjad
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Evaluations of model editing, a technique for changing the factual knowledge held by Large Language Models (LLMs), currently only use the ‘next few token’ completions after a prompt. As a result, the impact of these methods on longer natural language generation is largely unknown. We introduce long-form evaluation of model editing (LEME) a novel evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. Our protocol consists of a machine-rated survey and a classifier which correlates well with human ratings. Importantly, we find that our protocol has very little relationship with previous short-form metrics (despite being designed to extend efficacy, generalization, locality, and portability into a long-form setting), indicating that our method introduces a novel set of dimensions for understanding model editing methods. Using this protocol, we benchmark a number of model editing techniques and present several findings including that, while some methods (ROME and MEMIT) perform well in making consistent edits within a limited scope, they suffer much more from factual drift than other methods. Finally, we present a qualitative analysis that illustrates common failure modes in long-form generative settings including internal consistency, lexical cohesion, and locality issues.

Co-authors

Venues

Findings1
NAACL1

Fix author