Yi Zeng

Other people with similar names: Yi Zeng

Unverified author pages with similar names: Yi Zeng


2026

Multimodal Large Language Models (MLLMs) hold significant promise for revolutionizing traditional education and reducing teachers’ workload. However, accurately interpreting unconstrained STEM student handwritten solutions with intertwined mathematical formulas, diagrams, and textual reasoning poses a significant challenge due to the lack of authentic and domain-specific benchmarks. Additionally, current evaluation paradigms predominantly rely on the outcomes of downstream tasks (e.g., auto-grading), which often probe only a subset of the recognized content, thereby failing to capture the MLLMs’ understanding of complex handwritten logic as a whole. To bridge this gap, we release EDU-CIRCUIT-HW, a dataset consisting of 1,300+ authentic student handwritten solutions from a university-level STEM course. Utilizing the expert-verified verbatim transcriptions and grading reports of student solutions, we simultaneously evaluate various MLLMs’ upstream recognition fidelity and downstream auto-grading performance. Our evaluation uncovers an astonishing scale of latent failures within MLLM-recognized student handwritten content, highlighting the models’ insufficient reliability for auto-grading and other understanding-oriented applications in high-stakes educational settings. In response, we present a case study demonstrating that leveraging identified error patterns to preemptively detect and rectify recognition errors, with only minimal human intervention (e.g., with 3.3% assignments routed to human graders while the rest to GPT-5.1 grader), can effectively enhance the robustness of the deployed AI-enabled grading system on unseen student solutions.

2024

Safety backdoor attacks in large language models (LLMs) enable harmful behaviors to be stealthily triggered while evading detection during normal interactions. The high dimensionality of the trigger search space and the diverse range of potential malicious behaviors in LLMs make this a critical open problem. This paper presents BEEAR, a novel mitigation method based on a key insight: backdoor triggers induce a uniform drift in the model’s embedding space, irrespective of the trigger’s form or targeted behavior. Leveraging this observation, we introduce a bi-level optimization approach. The inner level identifies universal perturbations to the decoder’s embeddings that steer the model towards defender-defined unwanted behaviors; the outer level fine-tunes the model to reinforce safe behaviors against these perturbations. Our experiments demonstrate the effectiveness of this approach, reducing the success rate of safety backdoor attacks from over 95% to <1% for general harmful behaviors and from 47% to 0% for Sleeper Agents, without compromising the model’s helpfulness. Notably, our method relies only on defender-defined sets of safe and unwanted behaviors without any assumptions about the trigger location or attack mechanism. This work represents the first practical framework to counter safety backdoors in LLMs and provides a foundation for future advancements in AI safety and security.