Nianqi Li

2025

Historical analogies, which compare known past events with contemporary but unfamiliar events, are important abilities that help people make decisions and understand the world. However, research in applied history suggests that people have difficulty finding appropriate analogies. And previous studies in the AI community have also overlooked historical analogies. To fill this gap, in this paper, we focus on the historical analogy acquisition task, which aims to acquire analogous historical events for a given event. We explore retrieval and generation methods for acquiring historical analogies based on different large language models (LLMs). Furthermore, we propose a self-reflection method to mitigate hallucinations and stereotypes when LLMs generate historical analogies. Through human evaluations and our specially designed automatic multi-dimensional assessment, we find that LLMs generally have a good potential for historical analogies. And the performance of the models can be further improved by using our self-reflection method. Resources of this paper can be found at https://anonymous.4open.science/r/Historical-Analogy-of-LLMs-FC17

Program-of-Thought, which aims to use program instead of natural language in reasoning, is an important way for LLMs to solve mathematical problems. Since different programming languages excel in different areas, it is natural to use the most suitable language for solving specific problems. However, current research only focuses on single language PoT, ignoring the differences between programming languages. Therefore, this paper proposes a multilingual programme reasoning method, MultiLingPoT, and deeply explores the impact of multilingual integration in the training and inference. This method allows the model to answer questions using multiple languages by fine-tuning on multilingual data and improving individual language’s reasoning accuracy by 2.5%. Additionally, prior and posterior selection methods are used to help the model select the most suitable language during inference, and achieves 8% performance gains. Finally, our code metric analysis shows that language differences manifest in encapsulation levels and implementation granularity, while strategic deviation from language conventions can enhances code performance.

2024

Concept reasoning is an important capability for models to understand the world. However, the existing datasets, such as concept extraction and concept generation, suffer from modeledge leakage and context leakage. To address these limitations, we construct a dataset of concept reasoning for large language models (CR-LLM) with modeledge leakage prevention and context leakage prevention, which consists of 2,167 samples and covers different concept types. In addition, we propose a hybrid reasoning method, consisting of inductive reasoning, deductive reasoning and a controller. This method allows large language models to adaptively select the optimal reasoning method for each input sample. Finally, we conduct extensive experiments on CR-LLM using different models and methods. The results show that existing large language models and reasoning methods perform sub-optimally in the concept reasoning task. In contrast, our proposed method significantly improves the capabilities, achieving a 7% increase in accuracy compared to CoT and demonstrating better granularity. We release CR-LLM and code at https://github.com/Nianqi-Li/Concept-Reasoning-for-LLMs.