Yanming Li

2026

Tool graphs (TG) model dependencies among tools and resources, enabling more structured organization and management of large toolsets. However, existing methods and benchmarks often formulate tool learning (TL) as a single-solution setting, overlooking the fact that many tasks admit multiple valid tool combinations and therefore require optimal solution selection. Moreover, exploring large-scale TG is computationally expensive, especially under constrained context budgets. To address these challenges, we propose TOPT, an efficient framework for learning optimal TL policies over large TG, as well as construct MultiSoTLBench, a large-scale Multi-Solution TL Benchmark, where each task admits multiple valid solutions. Specifically, to improve search efficiency in large action spaces, TOPT adopts a progressive graph expansion strategy: we train a reinforcement learning (RL) agent to acquire transferable expansion skills and construct, on demand, a compact solvable subgraph that preserves only task-relevant links. This reduces the size of the candidate space and the context usage from the outset. To enable optimal selection, we further propose a progressive graph reasoning framework. It performs RL-driven optimality analysis and scheduling on the expanded subgraph to generate an optimal tool chain that balances path length and tool cost. Comprehensive experiments on MultiSoTLBench demonstrate that TOPT generalizes effectively, improving task success and solution optimality by 46.21% and 66.34%, respectively.

2025

pdf bib abs

Open-World Attribute Mining for E-Commerce Products with Multimodal Self-Correction Instruction Tuning
Jiaqi Li | Yanming Li | Xiaoli Shen | Chuanyi Zhang | Guilin Qi | Sheng Bi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In e-commerce, effective product Attribute Mining (AM) is essential for improving product features and aiding consumer decisions. However, current AM methods often focus on extracting attributes from unimodal text, underutilizing multimodal data. In this paper, we propose a novel framework called Multimodal Self-Correction Instruction Tuning (MSIT) to mine new potential attributes from both images and text with Multimodal Large Language Models. The tuning process involves two datasets: Attribute Generation Tuning Data (AGTD) and Chain-of-Thought Tuning Data (CTTD). AGTD is constructed utilizing in-context learning with a small set of seed attributes, aiding the MLLM in accurately extracting attribute-value pairs from multimodal information. To introduce explicit reasoning and improve the extraction in accuracy, we construct CTTD, which incorporates a structured 5-step reasoning process for self-correction. Finally, we employ a 3-stage inference process to filter out redundant attributes and sequentially validate each generated attribute. Comprehensive experimental results on two datasets show that MSIT outperforms state-of-the-art methods. We will release our code and data in the near future.

Co-authors

Yi Lin 1

Venues

ACL1
Findings1

Fix author