Ye Yuan
Other people with similar names: Ye Yuan
Unverified author pages with similar names: Ye Yuan
2026
LLM Safety From Within: Detecting Harmful Content with Internal Representations
Difan Jiao | Yilun Liu | Ye Yuan | Zhenwei Tang | Linfeng Du | Haolun Wu | Ashton Anderson
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Difan Jiao | Yilun Liu | Ye Yuan | Zhenwei Tang | Linfeng Du | Haolun Wu | Ashton Anderson
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Guard models are widely used to detect harmful content in user prompts and LLM responses. However, state-of-the-art guard models rely solely on terminal-layer representations and overlook the rich safety-relevant features distributed across internal layers. We present SIREN, a lightweight guard model that harnesses these internal features. By identifying safety neurons via linear probing and combining them through an adaptive layer-weighted strategy, SIREN builds a harmfulness detector from LLM internals without modifying the underlying model. Our comprehensive evaluation shows that SIREN substantially outperforms state-of-the-art open-source guard models across multiple benchmarks while using 250× fewer trainable parameters. Moreover, SIREN exhibits superior generalization to unseen benchmarks, naturally enables real-time streaming detection, and significantly improves inference efficiency compared to generative guard models. Overall, our results highlight LLM internal states as a promising foundation for practical, high-performance harmfulness detection.
Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization
Linfeng Du | Ye Yuan | Zichen Zhao | Fuyuan Lyu | Emiliano Penaloza | Xiuying Chen | Zipeng Sun | Jikun Kang | Laurent Charlin | Xue Liu | Haolun Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Linfeng Du | Ye Yuan | Zichen Zhao | Fuyuan Lyu | Emiliano Penaloza | Xiuying Chen | Zipeng Sun | Jikun Kang | Laurent Charlin | Xue Liu | Haolun Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) excel at general-purpose tasks, yet adapting their responses to individual users remains challenging. Retrieval augmentation provides a lightweight alternative to fine-tuning by conditioning LLMs on user history records, and existing approaches typically select these records based on semantic relevance. We argue that relevance serves as an unreliable proxy for utility: a record may be semantically similar to a query yet fail to improve generation quality or even degrade it due to redundancy or conflicting information. To bridge this gap, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for LLM pErsonalization. In contrast to a greedy selection of the most relevant records, PURPLE treats profile construction as an order-sensitive generation process and utilizes a Plackett-Luce ranking model to capture complex inter-record dependencies. By training with semantically rich feedback provided by the likelihood of the reference response, our method aligns retrieval directly with generation quality. Extensive experiments on nine personalization tasks demonstrate that PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines in both effectiveness and efficiency, establishing a principled and scalable solution for optimizing user profiles.
Preference Heads in Large Language Models: A Mechanistic Framework for Interpretable Personalization
Weixu Zhang | Ye Yuan | Changjiang Han | Yuxing Tian | Zipeng Sun | Linfeng Du | Jikun Kang | Hong Kang | Xue Liu | Haolun Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weixu Zhang | Ye Yuan | Changjiang Han | Yuxing Tian | Zipeng Sun | Linfeng Du | Jikun Kang | Hong Kang | Xue Liu | Haolun Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) exhibit strong implicit personalization ability, yet most existing approaches treat this behavior as a black box, relying on prompt engineering or fine tuning on user data. In this work, we adopt a mechanistic interpretability perspective and hypothesize the existence of a sparse set of Preference Heads, attention heads that encode user specific stylistic and topical preferences and exert a causal influence on generation. We introduce Differential Preference Steering (DPS), a training free framework that (1) identifies Preference Heads through causal masking analysis and (2) leverages them for controllable and interpretable personalization at inference time. DPS computes a Preference Contribution Score (PCS) for each attention head, directly measuring its causal impact on user aligned outputs. During decoding, we contrast model predictions with and without Preference Heads, amplifying the difference between personalized and generic logits to selectively strengthen preference aligned continuations. Experiments on widely used personalization benchmarks across multiple LLMs demonstrate consistent gains in personalization fidelity while preserving content coherence and low computational overhead. Beyond empirical improvements, DPS provides a mechanistic explanation of where and how personalization emerges within transformer architectures.
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection
Zhiwei Liu | Yupeng Cao | Yuechen Jiang | Mohsinul Kabir | Polydoros Giannouris | Chen Xu | Ziyang Xu | Tianlei Zhu | Md. Tariquzzaman | Triantafillos Papadopoulos | Yan Wang | Lingfei Qian | Xueqing Peng | Zhuohan Xie | Ye Yuan | Saeed Almheiri | Abdulrazzaq Alnajjar | Ming-Bin Chen | Harry Stuart | Paul Thompson | Prayag Tiwari | Alejandro Lopez-Lira | Xue Liu | Jimin Huang | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2026
Zhiwei Liu | Yupeng Cao | Yuechen Jiang | Mohsinul Kabir | Polydoros Giannouris | Chen Xu | Ziyang Xu | Tianlei Zhu | Md. Tariquzzaman | Triantafillos Papadopoulos | Yan Wang | Lingfei Qian | Xueqing Peng | Zhuohan Xie | Ye Yuan | Saeed Almheiri | Abdulrazzaq Alnajjar | Ming-Bin Chen | Harry Stuart | Paul Thompson | Prayag Tiwari | Alejandro Lopez-Lira | Xue Liu | Jimin Huang | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks (MFMD). In this work, we propose MFMDScen, a comprehensive benchmark for evaluating behavioral biases of LLMs in MFMD across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and personality-based, (ii) role- and region-based, and (iii) role-based scenarios incorporating ethnicity and religious beliefs. We further develop a multilingual financial misinformation dataset covering English, Chinese, Greek, and Bengali. By integrating these scenarios with misinformation claims, MFMDScen enables a systematic evaluation of 22 mainstream LLMs. Our findings reveal that pronounced behavioral biases persist across both commercial and open-source models. This project is available at https://github.com/lzw108/FMD.
FACTS: Table Summarization via Offline Template Generation with Agentic Workflows
Ye Yuan | Mohammad Amin Shabani | Siqi Liu
Findings of the Association for Computational Linguistics: ACL 2026
Ye Yuan | Mohammad Amin Shabani | Siqi Liu
Findings of the Association for Computational Linguistics: ACL 2026
Query-focused table summarization requires generating natural language summaries of tabular data conditioned on a user query, enabling users to access insights beyond fact retrieval. Existing approaches face key limitations: table-to-text models require costly fine-tuning and struggle with complex reasoning, prompt-based LLM methods suffer from token-limit and efficiency issues while exposing sensitive data, and prior agentic pipelines often rely on decomposition, planning, or manual templates that lack robustness and scalability. To mitigate these issues, we introduce an agentic workflow, FACTS, a Fast, Accurate, and Privacy-Compliant Table Summarization approach via Offline Template Generation. FACTS produces offline templates, consisting of SQL queries and Jinja2 templates, which can be rendered into natural language summaries and are reusable across multiple tables sharing the same schema. It enables fast summarization through reusable offline templates, accurate outputs with executable SQL queries, and privacy compliance by sending only table schemas to LLMs. Evaluations on widely-used benchmarks show that FACTS consistently outperforms baseline methods, establishing it as a practical solution for real-world query-focused table summarization. Our code is available at https://github.com/BorealisAI/FACTS.
2024
Learning to Extract Structured Entities Using Language Models
Haolun Wu | Ye Yuan | Liana Mikaelyan | Alexander Meulemans | Xue Liu | James Hensman | Bhaskar Mitra
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Haolun Wu | Ye Yuan | Liana Mikaelyan | Alexander Meulemans | Xue Liu | James Hensman | Bhaskar Mitra
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights from various perspectives. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP (AESOP) metric, designed to appropriately assess model performance. Later, we introduce a new Multistage Structured Entity Extraction (MuSEE) model that harnesses the power of LMs for enhanced effectiveness and efficiency by decomposing the extraction task into multiple stages. Quantitative and human side-by-side evaluations confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction. Our source code is available at https://github.com/microsoft/Structured-Entity-Extraction.
Search
Fix author
Co-authors
- Xue Liu 4
- Haolun Wu 4
- Linfeng Du 3
- Jikun Kang 2
- Zipeng Sun 2
- Saeed Almheiri 1
- Abdulrazzaq Alnajjar 1
- Sophia Ananiadou 1
- Ashton Anderson 1
- Yupeng Cao 1
- Laurent Charlin 1
- Xiuying Chen 1
- Ming-Bin Chen 1
- Polydoros Giannouris 1
- Changjiang Han 1
- James Hensman 1
- Jimin Huang 1
- Yuechen Jiang 1
- Difan Jiao 1
- Mohsinul Kabir 1
- Hong Kang 1
- Yilun Liu 1
- Zhiwei Liu 1
- Siqi Liu 1
- Alejandro Lopez-Lira 1
- Fuyuan Lyu 1
- Alexander Meulemans 1
- Liana Mikaelyan 1
- Bhaskar Mitra 1
- Triantafillos Papadopoulos 1
- Emiliano Penaloza 1
- Xueqing Peng 1
- Lingfei Qian 1
- Mohammad Amin Shabani 1
- Harry Stuart 1
- Zhenwei Tang 1
- Md. Tariquzzaman 1
- Paul Thompson 1
- Yuxing Tian 1
- Prayag Tiwari 1
- Yan Wang 1
- Zhuohan Xie 1
- Chen Xu 1
- Ziyang Xu 1
- Weixu Zhang 1
- Zichen Zhao 1
- Tianlei Zhu 1