Hongfei Yan
2026
LinkQA: Synthesizing Diverse QA from Multiple Seeds Strongly Linked by Knowledge Points
Xuemiao Zhang | Can Ren | Chengying Tu | Rongxiang Weng | Hongfei Yan | Jingang Wang | Xunliang Cai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xuemiao Zhang | Can Ren | Chengying Tu | Rongxiang Weng | Hongfei Yan | Jingang Wang | Xunliang Cai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The advancement of large language models (LLMs) struggles with the scarcity of high-quality, diverse training data. To address this limitation, we propose LinkSyn, a KP-graph-based synthesis framework that for the first time enables flexible control over discipline and difficulty distributions while balancing KP coverage and popularity. LinkSyn extracts KPs from question-answering (QA) seed data and constructs a KP graph to synthesize diverse QA data from multiple seeds strongly linked by KPs and sampled from graph walks. Specifically, LinkSyn incorporates (1) a knowledge value function to guide the adjustment of path sampling probability and balance KP coverage and popularity during graph walks; (2) diffusion-based synthesis via a strong reasoning model by leveraging multiple seeds with dense logical associations along each path; and (3) high-difficulty QA enhancement within given disciplines by flexible difficulty adjustments. By executing LinkSyn, we synthesize LinkQA, a diverse multi-disciplinary QA dataset with 50B tokens. Extensive experiments on Llama-3 8B demonstrate that continual pre-training with LinkQA yields an average improvement of 11.51% on MMLU and CMMLU, establishing new SOTA results. LinkQA consistently enhances performance across model size and initial FLOPs scales.
2015
User Based Aggregation for Biterm Topic Model
Weizheng Chen | Jinpeng Wang | Yan Zhang | Hongfei Yan | Xiaoming Li
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Weizheng Chen | Jinpeng Wang | Yan Zhang | Hongfei Yan | Xiaoming Li
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
2014
Group based Self Training for E-Commerce Product Record Linkage
Xin Zhao | Yuexin Wu | Hongfei Yan | Xiaoming Li
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
Xin Zhao | Yuexin Wu | Hongfei Yan | Xiaoming Li
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2013
Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs
Jinpeng Wang | Wayne Xin Zhao | Haitian Wei | Hongfei Yan | Xiaoming Li
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
Jinpeng Wang | Wayne Xin Zhao | Haitian Wei | Hongfei Yan | Xiaoming Li
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
2012
A Novel Burst-based Text Representation Model for Scalable Event Detection
Xin Zhao | Rishan Chen | Kai Fan | Hongfei Yan | Xiaoming Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Xin Zhao | Rishan Chen | Kai Fan | Hongfei Yan | Xiaoming Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Identifying Event-related Bursts via Social Media Activities
Xin Zhao | Baihan Shu | Jing Jiang | Yang Song | Hongfei Yan | Xiaoming Li
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Xin Zhao | Baihan Shu | Jing Jiang | Yang Song | Hongfei Yan | Xiaoming Li
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
SSHLDA: A Semi-Supervised Hierarchical Topic Model
Xian-Ling Mao | Zhao-Yan Ming | Tat-Seng Chua | Si Li | Hongfei Yan | Xiaoming Li
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Xian-Ling Mao | Zhao-Yan Ming | Tat-Seng Chua | Si Li | Hongfei Yan | Xiaoming Li
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning