Sile Hu

Also published as: 斯勒


2025

pdf bib
Tracing and Dissecting How LLMs Recall Factual Knowledge for Real World Questions
Yiqun Wang | Chaoqun Wan | Sile Hu | Yonggang Zhang | Xiang Tian | Yaowu Chen | Xu Shen | Jieping Ye
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advancements in large language models (LLMs) have shown promising ability to perform commonsense reasoning, bringing machines closer to human-like understanding. However, deciphering the internal reasoning processes of LLMs remains challenging due to the complex interdependencies among generated tokens, especially in practical question-answering. In this study, we introduce a two-dimensional analysis framework—comprising token back-tracing and individual token decoding—to uncover how LLMs conduct factual knowledge recall. Through explanatory analysis of three typical reasoning datasets, we identify a consistent three-phase pattern: Subject Augmentation and Broadcasting, Object Retrieval and Reranking, and Conclusion Fusion and Generation. Our findings reveal that LLMs do not lack relevant knowledge but struggle to select the most accurate information based on context during the retrieval and rerank phase. Leveraging these findings, we apply representation engineering and selective fine-tuning to target specific modules responsible for retrieval and rerank errors. Experimental results show large improvements in response accuracy for both in-domain and out-of-domain settings, validating the rationality of the interpreting result.

2020

pdf bib
蒙古文拼写形式多样化现象研究(A Study of Spelling Variety of Mongolian)
Shuangcheng Bai (白双成) | Sile Hu (呼斯勒)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

蒙古文文本中存在一个有别于多数其他文字的特别现象──看到的单词字形正确但其内码序列不正确,或者说单词“变形显现字形”序列正确但“名义字符”序列不正确的现象,我们称其为蒙古文的拼写形式多样化现象。本文先定义该现象及相关概念,再通过简单图示、例词拼写形式穷举、新闻语料统计分析和基于整篇文章标注统计等多方式、多角度论证这一现象的事实性和严重性,分析导致这一现象的深层原因并指出拼写形式多样化对蒙古文信息处理和应用方面的严重影响,最后提出通过推广普及录入规范和标准提高用户意识、使用智能输入法避免误录、使用校对纠错工具后纠正、基于生语料的统计学习方法为补充等多途径解决方法。本文对蒙古文标准编码的推广普及具有较好的参考价值。