Yuen Chen
2026
PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers in Overleaf
Jiarui Liu | Terry Jingchen Zhang | Ryan Faulkner | Xuanqiang Angelo Huang | Vilém Zouhar | Dominik Glandorf | Isabel Dahlgren | Rishit Dagli | Yuen Chen | Felix Leeb | Van Q. Truong | Punya Syon Pandey | Yves Bicker | Suvajit Majumder | Wenyuan Jiang | Zeju Qiu | Sankalan Pal Chowdhury | Mrinmaya Sachan | Bernhard Schölkopf | Mona T. Diab | Zhijing Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Jiarui Liu | Terry Jingchen Zhang | Ryan Faulkner | Xuanqiang Angelo Huang | Vilém Zouhar | Dominik Glandorf | Isabel Dahlgren | Rishit Dagli | Yuen Chen | Felix Leeb | Van Q. Truong | Punya Syon Pandey | Yves Bicker | Suvajit Majumder | Wenyuan Jiang | Zeju Qiu | Sankalan Pal Chowdhury | Mrinmaya Sachan | Bernhard Schölkopf | Mona T. Diab | Zhijing Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Expert writing feedback from experienced researchers is critical for early-career scholars to improve their manuscripts, yet high-quality feedback often remains scarce because reviewing research papers is labor-intensive. Emerging AI-powered writing assistants largely focus on grammar fixes or simulating peer review with final scores, yet they fall short of providing concrete, actionable suggestions that help students improve their papers during drafting. We present PaperMentor, a human-centered writing assistant system that delivers actionable suggestions as Overleaf-native inline comments while leaving the actual writing entirely to human authors. PaperMentor integrates an expert skill library carefully curated from established researchers’ writing advice with 12 specialized agents covering different aspects of paper writing, such as formatting compliance, phrasing accuracy, and terminology consistency. In a user study (n=14), 90.6% of the generated comments were rated actionable and 67.5% were rated valid, significantly outperforming a GPT-5.2 baseline without the skill library. We release PaperMentor as open source for public use.
2025
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias
Yuen Chen | Vethavikashini Chithrra Raghuram | Justus Mattern | Rada Mihalcea | Zhijing Jin
Findings of the Association for Computational Linguistics: NAACL 2025
Yuen Chen | Vethavikashini Chithrra Raghuram | Justus Mattern | Rada Mihalcea | Zhijing Jin
Findings of the Association for Computational Linguistics: NAACL 2025
Generated texts from large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics. These findings motivate research efforts aiming to understand and measure such effects. This paper introduces a causal formulation for bias measurement in generative language models. Based on this theoretical foundation, we outline a list of desiderata for designing robust bias benchmarks. We then propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. We test several state-of-the-art open-source LLMs on OccuGender, including Llama, Mistral, and their instruction-tuned versions. The results show that these models exhibit substantial occupational gender bias. Lastly, we discuss prompting strategies for bias mitigation and an extension of our causal formulation to illustrate the generalizability of our framework.
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Agam Goyal | Vedant Rathi | William Yeh | Yian Wang | Yuen Chen | Hari Sundaram
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Agam Goyal | Vedant Rathi | William Yeh | Yian Wang | Yuen Chen | Hari Sundaram
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) are now ubiquitous in user-facing applications, yet they still generate undesirable toxic outputs, including profanity, vulgarity, and derogatory remarks. Although numerous detoxification methods exist, most apply broad, surface-level fixes and can therefore easily be circumvented by jailbreak attacks. In this paper we leverage sparse autoencoders (SAEs) to identify toxicity-related directions in the residual stream of models and perform targeted activation steering using the corresponding decoder vectors. We introduce three tiers of steering aggressiveness and evaluate them on GPT-2 Small and Gemma-2-2B, revealing trade-offs between toxicity reduction and language fluency. At stronger steering strengths, these causal interventions surpass competitive baselines in reducing toxicity by up to 20%, though fluency can degrade noticeably on GPT-2 Small depending on the aggressiveness. Crucially, standard NLP benchmark scores upon steering remain stable, indicating that the model’s knowledge and general abilities are preserved. We further show that feature-splitting in wider SAEs hampers safety interventions, underscoring the importance of disentangled feature learning. Our findings highlight both the promise and the current limitations of SAE-based causal interventions for LLM detoxification, further suggesting practical guidelines for safer language-model deployment.
2024
Analyzing the Role of Semantic Representations in the Era of Large Language Models
Zhijing Jin | Yuen Chen | Fernando Gonzalez Adauto | Jiarui Liu | Jiayi Zhang | Julian Michael | Bernhard Schölkopf | Mona Diab
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Zhijing Jin | Yuen Chen | Fernando Gonzalez Adauto | Jiarui Liu | Jiayi Zhang | Julian Michael | Bernhard Schölkopf | Mona Diab
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LLMs? Specifically, we investigate the effect of Abstract Meaning Representation (AMR) across five diverse NLP tasks. We propose an AMR-driven chain-of-thought prompting method, which we call AMRCOT, and find that it generally hurts performance more than it helps. To investigate what AMR may have to offer on these tasks, we conduct a series of analysis experiments. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction. We recommend focusing on these areas for future work in semantic representations for LLMs. Our code: https://github.com/causalNLP/amr_llm
CausalCite: A Causal Formulation of Paper Citations
Ishan Agrawal | Zhijing Jin | Ehsan Mokhtarian | Siyuan Guo | Yuen Chen | Mrinmaya Sachan | Bernhard Schölkopf
Findings of the Association for Computational Linguistics: ACL 2024
Ishan Agrawal | Zhijing Jin | Ehsan Mokhtarian | Siyuan Guo | Yuen Chen | Mrinmaya Sachan | Bernhard Schölkopf
Findings of the Association for Computational Linguistics: ACL 2024
Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from large language models (LLMs), extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at https://github.com/causalNLP/causal-cite.
Search
Fix author
Co-authors
- Zhijing Jin 4
- Bernhard Schölkopf 3
- Mona Diab 2
- Jiarui Liu 2
- Mrinmaya Sachan 2
- Ishan Agrawal 1
- Yves Bicker 1
- Rishit Dagli 1
- Isabel Dahlgren 1
- Ryan Faulkner 1
- Dominik Glandorf 1
- Fernando Gonzalez Adauto 1
- Agam Goyal 1
- Siyuan Guo 1
- Xuanqiang Angelo Huang 1
- Wenyuan Jiang 1
- Felix Leeb 1
- Suvajit Majumder 1
- Justus Mattern 1
- Julian Michael 1
- Rada Mihalcea 1
- Ehsan Mokhtarian 1
- Sankalan Pal Chowdhury 1
- Punya Syon Pandey 1
- Zeju Qiu 1
- Vethavikashini Chithrra Raghuram 1
- Vedant Rathi 1
- Hari Sundaram 1
- Van Q. Truong 1
- Yian Wang 1
- William Yeh 1
- Jiayi Zhang 1
- Terry Jingchen Zhang 1
- Vilém Zouhar 1