2025
pdf
bib
abs
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie
|
Yangsibo Huang
|
Chiyuan Zhang
|
Da Yu
|
Xinyun Chen
|
Bill Yuchen Lin
|
Bo Li
|
Badih Ghazi
|
Ravi Kumar
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Large language models (LLMs) achieve good performance on challenging reasoning benchmarks, yet could also make basic reasoning mistakes. This contrasting behavior is puzzling when it comes to understanding the mechanisms behind LLMs’ reasoning capabilities. One hypothesis is that the increasingly high and nearly saturated performance on common reasoning benchmarks could be due to the memorization of similar problems. In this paper, we systematically investigate this hypothesis with a quantitative measurement of memorization in reasoning tasks, using two dynamically generated logical reasoning benchmarks based on Knights and Knaves (K&K) puzzles and Zebra puzzles (DynamicZebra). We find that LLMs could interpolate and memorize the training puzzles (achieving near-perfect accuracy) after fine-tuning, yet they struggle with slight variations of these puzzles. On the other hand, we show that while fine-tuning leads to heavy memorization, it also consistently improves generalization performance. Through in-depth analyses with perturbation tests, cross difficulty-level transferability, probing model internals, and fine-tuning with wrong answers, we establish that LLMs develop reasoning skills on logical puzzles alongside memorization. Finally, our analysis based on a per-sample memorization score sheds light on how LLMs switch between reasoning and memorization when solving logical puzzles.
2024
pdf
bib
abs
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
Bowen Jin
|
Chulin Xie
|
Jiawei Zhang
|
Kashob Kumar Roy
|
Yu Zhang
|
Zheng Li
|
Ruirui Li
|
Xianfeng Tang
|
Suhang Wang
|
Yu Meng
|
Jiawei Han
Findings of the Association for Computational Linguistics: ACL 2024
Large language models (LLMs), while exhibiting exceptional performance, suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora to alleviate the issue. However, in many domains, texts are interconnected (e.g., academic papers in a bibliographic graph are linked by citations and co-authorships) which form a (text-attributed) graph. The knowledge in such graphs is encoded not only in single texts/nodes but also in their associated connections. To facilitate the research of augmenting LLMs with graphs, we manually construct a Graph Reasoning Benchmark dataset called GRBench, containing 1,740 questions that can be answered with the knowledge from 10 domain graphs. Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively. Each Graph-CoT iteration consists of three sub-steps: LLM reasoning, LLM-graph interaction, and graph execution. We conduct systematic experiments with three LLM backbones on GRBench, where Graph-CoT outperforms the baselines consistently. The code is available at https://github.com/PeterGriffinJin/Graph-CoT/.
2022
pdf
bib
Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022)
Bill Yuchen Lin
|
Chaoyang He
|
Chulin Xie
|
Fatemehsadat Mireshghallah
|
Ninareh Mehrabi
|
Tian Li
|
Mahdi Soltanolkotabi
|
Xiang Ren
Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022)