Xiaocong Du
2026
Analyze Like a Venture Capitalist: Information-Gain and Knowledge Enhanced Graph Reasoning for Startup Success Prediction
Haoyu Pei | Zhongyang Liu | Xiangyi Xiao | Xiaocong Du | Suting Hong | Kunpeng Zhang | Haipeng Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Haoyu Pei | Zhongyang Liu | Xiangyi Xiao | Xiaocong Du | Suting Hong | Kunpeng Zhang | Haipeng Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Most venture capital (VC) investments fail, while a few deliver outsized returns. Predicting startup success requires synthesizing relational evidence across company fundamentals, investor track records, and investment networks through explicit reasoning, which traditional machine learning and graph neural networks lack. Large language models excel at reasoning, but applying them to VC prediction must address: selecting compact evidence subgraphs from large investment networks, one-sided label noise where failures may be latent successes, and grounding decisions in structured VC domain knowledge. We present MIRAGE-VC, an evidence-grounded reasoning framework with three innovations. First, an information-gain-driven retriever distills networks into compact evidence subgraphs. Second, a dual-layer knowledge base grounds reasoning in VC principles. Third, a noise-aware mechanism down-weights mislabeled negatives via improved Positive-Unlabeled (PU) estimation. MIRAGE-VC achieves +5.9% F1 and +22.1% Precision@5 over state-of-the-art baselines. Expert evaluation confirms professional-quality rationales. We further validate our approach on public data with consistent improvements. Code and reasoning results are available at: https://github.com/ZhangDataLab/MIRAGE-VC.git
2022
Harmless Transfer Learning for Item Embeddings
Chengyue Gong | Xiaocong Du | Dhruv Choudhary | Bhargav Bhushanam | Qiang Liu | Arun Kejariwal
Findings of the Association for Computational Linguistics: NAACL 2022
Chengyue Gong | Xiaocong Du | Dhruv Choudhary | Bhargav Bhushanam | Qiang Liu | Arun Kejariwal
Findings of the Association for Computational Linguistics: NAACL 2022
Learning embedding layers (for classes, words, items, etc.) is a key component of lots of applications, ranging from natural language processing, recommendation systems to electronic health records, etc. However, the frequency of real-world items follows a long-tail distribution in these applications, causing naive training methods perform poorly on the rare items. A line of previous works address this problem by transferring the knowledge from the frequent items to rare items by introducing an auxiliary transfer loss. However, when defined improperly, the transfer loss may introduce harmful biases and deteriorate the performance. In this work, we propose a harmless transfer learning framework that limits the impact of the potential biases in both the definition and optimization of the transfer loss. On the definition side, we reduce the bias in transfer loss by focusing on the items to which information from high-frequency items can be efficiently transferred. On the optimization side, we leverage a lexicographic optimization framework to efficiently incorporate the information of the transfer loss without hurting the minimization of the main prediction loss function. Our method serves as a plug-in module and significantly boosts the performance on a variety of NLP and recommendation system tasks.