Jingyu Peng


2026

The necessity of explicit linguistic representations has been increasingly questioned in the era of large language models (LLMs). In this work, we revisit this issue using Universal Dependencies (UD) as a case study, examining whether and in what ways this cross-lingual syntactic framework can still benefit contemporary LLMs. We focus on a cross-lingual adversarial paraphrase identification task that is designed to foreground the role of syntactic structure in semantic interpretation across languages. Within this setting, we systematically evaluate three strategies for integrating UD into LLMs: UD-Prompt, UD-Tuning, and UD-Attention. Our experiments show that, although the magnitude of gains depends on how UD-based structural priors interact with model behavior and cross-lingual variation, UD-augmented models consistently outperform their syntax-agnostic counterparts. Across strategies, we observe average accuracy improvements of 2.67%, 8.24%, and 2.53%, respectively. These findings demonstrate that linguistic knowledge remains informative for LLMs, offering practical value in cross-lingual settings where structural alignment is challenging.
Despite substantial advancements in aligning LLMs with human values, current safety mechanisms remain susceptible to jailbreak attacks. We attribute this vulnerability to the distributional discrepancies between alignment-oriented prompts and malicious prompts. To investigate this, and drawing inspiration from logic-driven NLP tasks, we introduce LogiBreak, a universal black-box jailbreak method that utilizes logical expression translation to bypass LLM safety mechanisms. By converting harmful natural language prompts into formal logical expressions, LogiBreak exploits the distributional gap between alignment data and logic-expressed inputs, preserving the underlying semantic intent and readability while evading safety constraints. Furthermore, to fill the gap of existing benchmarks that lack systematic resources specifically targeting logical expression-based attacks against LLM robustness, we construct a novel multilingual logical expression jailbreak dataset for evaluation. Our evaluations of LogiBreak in five languages demonstrate its effectiveness and generalizability in various linguistic contexts. The code is available at https://github.com/Applied-Machine-Learning-Lab/ACL2026_Logibreak.

2025

Large language models (LLMs) have made remarkable strides in complex reasoning tasks, but their safety and robustness in reasoning processes remain unexplored, particularly in third-party platforms that facilitate user interactions via APIs. Existing attacks on LLM reasoning are constrained by specific settings or lack of imperceptibility, limiting their feasibility and generalizability. To address these challenges, we propose the Stepwise rEasoning Error Disruption (SEED) attack, which subtly injects errors into prior reasoning steps to mislead the model into producing incorrect subsequent reasoning and final answers. Unlike previous methods, SEED is compatible with zero-shot and few-shot settings, maintains the natural reasoning flow, and ensures covert execution without modifying the instruction. Extensive experiments on four datasets across four different models demonstrate SEED’s effectiveness, revealing the vulnerabilities of LLMs to disruptions in reasoning processes. These findings underscore the need for greater attention to the robustness of LLM reasoning to ensure safety in practical applications. Our code is available at: https://github.com/Applied-Machine-Learning-Lab/SEED-Attack