Penglei Sun
2026
FinKario: Event-Enhanced Automated Construction of Financial Knowledge Graph
Xiang Li | Penglei Sun | Wanyun Zhou | Zikai Wei | Yongqi Zhang | Xiaowen Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiang Li | Penglei Sun | Wanyun Zhou | Zikai Wei | Yongqi Zhang | Xiaowen Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Individual investors are significantly outnumbered and disadvantaged in financial markets, overwhelmed by abundant information and lacking professional analysis. Equity research reports stand out as crucial resources, offering valuable insights. By leveraging these reports, large language models (LLMs) can enhance investors’ decision-making capabilities and strengthen financial analysis. However, two key challenges limit their effectiveness: (1) the rapid evolution of market events often outpaces the slow update cycles of existing knowledge bases, (2) the long-form and unstructured nature of financial reports further hinders timely and context-aware integration by LLMs. To address these challenges, we tackle both data and methodological aspects. First, we introduce the Event-Enhanced Automated Construction of Financial Knowledge Graph (FinKario), a dataset comprising over 305,360 entities, 210,328 relational triples, and 19 distinct relation types. FinKario automatically integrates real-time company fundamentals and market events through prompt-driven extraction guided by professional institutional templates, providing structured and accessible financial insights for LLMs. Additionally, we propose a Two-Stage, Graph-Based retrieval strategy (FinKario-RAG), optimizing the retrieval of evolving, large-scale financial knowledge to ensure efficient and precise data access. Extensive experiments show that FinKario with FinKario-RAG achieves superior stock trend prediction accuracy, outperforming financial LLMs by 18.81% and institutional strategies by 17.85% on average in backtesting. [Our code is available at <https://github.com/Jackson906E/FinKario>.]
2025
Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research
Xiang Liu | Penglei Sun | Shuyan Chen | Longhan Zhang | Peijie Dong | Huajie You | Yongqi Zhang | Chang Yan | Xiaowen Chu | Tong-yi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Xiang Liu | Penglei Sun | Shuyan Chen | Longhan Zhang | Peijie Dong | Huajie You | Yongqi Zhang | Chang Yan | Xiaowen Chu | Tong-yi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
The rapid advancement of perovskite solar cells (PSCs) has led to an exponential growth in research publications, creating an urgent need for efficient knowledge management and reasoning systems in this domain. We present a comprehensive knowledge-enhanced system for PSCs that integrates three key components. First, we develop Perovskite-KG, a domain-specific knowledge graph constructed from 1,517 research papers, containing 23,789 entities and 22,272 relationships. Second, we create two complementary datasets: Perovskite-Chat, comprising 55,101 high-quality question-answer pairs generated through a novel multi-agent framework, and Perovskite-Reasoning, containing 2,217 carefully curated materials science problems. Third, we introduce two specialized large language models: Perovskite-Chat-LLM for domain-specific knowledge assistance and Perovskite-Reasoning-LLM for scientific reasoning tasks. Experimental results demonstrate that our system significantly outperforms existing models in both domain-specific knowledge retrieval and scientific reasoning tasks, providing researchers with effective tools for literature review, experimental design, and complex problem-solving in PSC research.
2022
Human-in-the-loop Robotic Grasping Using BERT Scene Representation
Yaoxian Song | Penglei Sun | Pengfei Fang | Linyi Yang | Yanghua Xiao | Yue Zhang
Proceedings of the 29th International Conference on Computational Linguistics
Yaoxian Song | Penglei Sun | Pengfei Fang | Linyi Yang | Yanghua Xiao | Yue Zhang
Proceedings of the 29th International Conference on Computational Linguistics
Current NLP techniques have been greatly applied in different domains. In this paper, we propose a human-in-the-loop framework for robotic grasping in cluttered scenes, investigating a language interface to the grasping process, which allows the user to intervene by natural language commands. This framework is constructed on a state-of-the-art grasping baseline, where we substitute a scene-graph representation with a text representation of the scene using BERT. Experiments on both simulation and physical robot show that the proposed method outperforms conventional object-agnostic and scene-graph based methods in the literature. In addition, we find that with human intervention, performance can be significantly improved. Our dataset and code are available on our project website https://sites.google.com/view/hitl-grasping-bert.