Baoyun Peng

2025

While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propose a novel collaborative framework named EffiQA that can strike a balance between performance and efficiency via an iterative paradigm. EffiQA consists of three stages: global planning, efficient KG exploration, and self-reflection. Specifically, EffiQA leverages the commonsense capability of LLMs to explore potential reasoning pathways through global planning. Then, it offloads semantic pruning to a small plug-in model for efficient KG exploration. Finally, the exploration results are fed to LLMs for self-reflection to further improve global planning and efficient KG exploration. Empirical evidence on multiple KBQA benchmarks shows EffiQA’s effectiveness, achieving an optimal balance between reasoning accuracy and computational costs. We hope the proposed new framework will pave the way for efficient, knowledge-intensive querying by redefining the integration of LLMs and KGs, fostering future research on knowledge-based question answering.

pdf bib abs
JI²S: Joint Influence‐Aware Instruction Data Selection for Efficient Fine‐Tuning
Jingyu Wei | Bo Liu | Tianjiao Wan | Baoyun Peng | Xingkong Ma | Mengmeng Guo
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Instruction tuning (IT) improves large language models (LLMs) by aligning their outputs with human instructions, but its success depends critically on training data quality, and datasets such as Alpaca often contain noisy or suboptimal examples that undermine fine‐tuning. Prior selection strategies score samples using general‐purpose LLMs (e.g., GPT), leveraging their strong language understanding yet introducing inherent biases that misalign with the target model’s behavior and yield unstable downstream performance. Influence‐based methods address this by estimating each example’s marginal contribution to overall performance, but they typically assume additive contributions and therefore overlook higher‐order interactions among samples. To overcome these limitations, we propose JI²S, a novel framework that jointly models both marginal and combinatorial influences within sample groups. Applying JI²S to select the top 1,000 most influential examples from Alpaca, we fine‐tune LLaMA2‐7B, Mistral‐7B, and LLaMA2‐13B and evaluate them on Open LLM Benchmarks, MT‐Bench, and GPT‐4–judged pairwise comparisons. Our experiments show that JI²S consistently outperforms full‐dataset training and strong baselines, highlighting the value of capturing joint influence for high‐quality instruction fine‐tuning. We provide our code in this GitHub repository.

Co-authors

Venues

coling1
emnlp1

Fix author