Yang Yan


2025

pdf bib
Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic
Yang Yan | Yu Lu | Renjun Xu | Zhenzhong Lan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) achieve impressive results on advanced mathematics benchmarks but sometimes fail on basic arithmetic tasks, raising the question of whether they have truly grasped fundamental arithmetic rules or are merely relying on pattern matching. To unravel this issue, we systematically probe LLMs’ understanding of two-integer addition (0 to 264) by testing three crucial properties: commutativity (A+B=B+A), representation invariance via symbolic remapping (e.g., 7 ↦ Y), and consistent accuracy scaling with operand length. Our evaluation of 12 leading LLMs reveals a stark disconnect: while models achieve high numeric accuracy (73.8–99.8%), they systematically fail these diagnostics. Specifically, accuracy plummets to ≤ 7.5% with symbolic inputs, commutativity is violated in up to 20% of cases, and accuracy scaling is non-monotonic. Interventions further expose this pattern-matching reliance: explicitly providing rules degrades performance by 29.49%, while prompting for explanations before answering merely maintains baseline accuracy. These findings demonstrate that current LLMs address elementary addition via pattern matching, not robust rule induction, motivating new diagnostic benchmarks and innovations in model architecture and training to cultivate genuine mathematical reasoning. Our dataset and generating code are available at https://github.com/kuri-leo/llm-arithmetic-diagnostic.

2022

pdf bib
AISFG: Abundant Information Slot Filling Generator
Yang Yan | Junda Ye | Zhongbao Zhang | Liwen Wang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

As an essential component of task-oriented dialogue systems, slot filling requires enormous labeled training data in a certain domain. However, in most cases, there is little or no target domain training data is available in the training stage. Thus, cross-domain slot filling has to cope with the data scarcity problem by zero/few-shot learning. Previous researches on zero/few-shot cross-domain slot filling focus on slot descriptions and examples while ignoring the slot type ambiguity and example ambiguity issues. To address these problems, we propose Abundant Information Slot Filling Generator (AISFG), a generative model with a novel query template that incorporates domain descriptions, slot descriptions, and examples with context. Experimental results show that our model outperforms state-of-the-art approaches in zero/few-shot slot filling task.