Cheng-Yu Lin


2025

pdf bib
Neuron-Level Differentiation of Memorization and Generalization in Large Language Models
Ko-Wei Huang | Yi-Fu Fu | Ching-Yu Tsai | Yu-Chieh Tu | Tzu-ling Cheng | Cheng-Yu Lin | Yi-Ting Yang | Heng-Yi Liu | Keng-Te Liao | Da-Cheng Juan | Shou-De Lin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We investigate how Large Language Models (LLMs) distinguish between memorization and generalization at the neuron level. Through carefully designed tasks, we identify distinct neuron subsets responsible for each behavior. Experiments on both a GPT-2 model trained from scratch and a pretrained LLaMA-3.2 model fine-tuned with LoRA show consistent neuron-level specialization. We further demonstrate that inference-time interventions on these neurons can steer the model’s behavior toward memorization or generalization. To assess robustness, we evaluate intra-task and inter-task consistency, confirming that these neuron-behavior associations reflect generalizable patterns rather than dataset-specific artifacts. Our findings reveal modular structure in LLMs and enable controlling memorization and generalization behaviors at inference time.

pdf bib
Concept-Based RAG Models: A High-Accuracy Fact Retrieval Approach
Cheng-Yu Lin | Jyh-Shing Jang
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

This study introduces a concept-based methodology to optimize Retrieval-Augmented Generation (RAG) tasks by assessing dataset certainty using entropy-based metrics and concept extraction techniques. Unlike traditional methods focused on reducing LLM hallucinations or modifying data structures, this approach evaluates inherent knowledge uncertainty from an LLM perspective. By pre-processing documents with LLMs, the concept-based method significantly enhances precision in tasks demanding high accuracy, such as legal, finance, or formal document responses.