Shangyi Geng


2025

pdf bib
Great Memory, Shallow Reasoning: Limits of kNN-LMs
Shangyi Geng | Wenting Zhao | Alexander M Rush
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

K-nearest neighbor language models (kNN-LMs), which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling as well as some downstream NLP benchmarks. These results have led researchers to argue that models trained on poor quality or outdated data could perform well by employing a kNN extension that has access to a higher-quality datastore. In this work, we ask whether this improved ability to recall information really translates into downstream abilities. We extensively evaluate kNN-LMs on a diverse set of tasks, ranging from sentiment classification and commonsense reasoning to multi-hop reasoning. Results show that kNN-LMs excel at memory-intensive tasks, where utilizing the patterns in the input is sufficient for determining the output, but struggle with reasoning tasks that require integrating multiple pieces of information to derive new knowledge. We further demonstrate through oracle experiments and qualitative analysis that even with perfect retrieval, kNN-LMs still fail to determine the correct answers, placing an upper bound on their reasoning performance.

2022

pdf bib
Graph Hawkes Transformer for Extrapolated Reasoning on Temporal Knowledge Graphs
Haohai Sun | Shangyi Geng | Jialun Zhong | Han Hu | Kun He
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Temporal Knowledge Graph (TKG) reasoning has attracted increasing attention due to its enormous potential value, and the critical issue is how to model the complex temporal structure information effectively. Recent studies use the method of encoding graph snapshots into hidden vector space and then performing heuristic deductions, which perform well on the task of entity prediction. However, these approaches cannot predict when an event will occur and have the following limitations: 1) there are many facts not related to the query that can confuse the model; 2) there exists information forgetting caused by long-term evolutionary processes. To this end, we propose a Graph Hawkes Transformer (GHT) for both TKG entity prediction and time prediction tasks in the future time. In GHT, there are two variants of Transformer, which capture the instantaneous structural information and temporal evolution information, respectively, and a new relational continuous-time encoding function to facilitate feature evolution with the Hawkes process. Extensive experiments on four public datasets demonstrate its superior performance, especially on long-term evolutionary tasks.