Yang Du

2025

pdf bib abs
MiMoTable: A Multi-scale Spreadsheet Benchmark with Meta Operations for Table Reasoning
Zheng Li | Yang Du | Mao Zheng | Mingyang Song
Proceedings of the 31st International Conference on Computational Linguistics

Extensive research has been conducted to explore the capability of Large Language Models (LLMs) for table reasoning and has significantly improved the performance on existing benchmarks. However, tables and user questions in real-world applications are more complex and diverse, presenting an unignorable gap compared to the existing benchmarks. To fill the gap, we propose a Multi-scale spreadsheet benchmark with Meta operations for Table reasoning, named as MiMoTable. Specifically, MiMoTable incorporates two key features. First, the tables in MiMoTable are all spreadsheets used in real-world scenarios, which cover seven domains and contain different types. Second, we define a new criterion with six categories of meta operations for measuring the difficulty of each question in MiMoTable, simultaneously as a new perspective for measuring the difficulty of the existing benchmarks. Experimental results show that Claude-3.5-Sonnet achieves the best performance with 77.4% accuracy, indicating that there is still significant room to improve for LLMs on MiMoTable. Furthermore, we grade the difficulty of existing benchmarks according to our new criteria. Experiments have shown that the performance of LLMs decreases as the difficulty of benchmarks increases, thereby proving the effectiveness of our proposed new criterion.

Recent advances in text-to-video (T2V) generation highlight the critical role of high-quality video-text pairs in training models capable of producing coherent and instruction-aligned videos. However, strategies for optimizing video captions specifically for T2V training remain underexplored. In this paper, we introduce VC4VG (Video Captioning for Video Generation), a comprehensive caption optimization framework tailored to the needs of T2V models. We begin by analyzing caption content from a T2V perspective, decomposing the essential elements required for video reconstruction into multiple dimensions, and proposing a principled caption design methodology. To support evaluation, we construct VC4VG-Bench, a new benchmark featuring fine-grained, multi-dimensional, and necessity-graded metrics aligned with T2V-specific requirements. Extensive T2V fine-tuning experiments demonstrate a strong correlation between improved caption quality and video generation performance, validating the effectiveness of our approach. We release all benchmark tools and code (https://github.com/qyr0403/VC4VG) to support further research.

pdf bib abs
Shy-hunyuan-MT at WMT25 General Machine Translation Shared Task
Mao Zheng | Zheng Li | Yang Du | Bingxin Qu | Mingyang Song
Proceedings of the Tenth Conference on Machine Translation

In this paper, we present our submission to the WMT25 shared task on machine translation, for which we propose Synergy-enhanced policy optimization framework, named Shy. This novel two-phase training framework synergistically combines knowledge distillation and fusion via reinforcement learning.In the first phase, we introduce a multi-stage training framework that harnesses the complementary strengths of multiple state-of-the-art large language models to generate diverse, high-quality translation candidates. These candidates serve as pseudo-references to guide the supervised fine-tuning of our model, Hunyuan-7B, effectively distilling the collective knowledge of multiple expert systems into a single efficient model.In the second phase, we further refine the distilled model through Group Relative Policy Optimization, a reinforcement learning technique that employs a composite reward function. By calculating reward from multiple perspectives, our model ensures better alignment with human preferences and evaluation metrics.Extensive experiments across multiple language pairs demonstrate that our model Shy-hunyuan-MT yields substantial improvements in translation quality compared to baseline approaches. Notably, our framework achieves competitive performance comparable to that of state-of-the-art systems while maintaining computational efficiency through knowledge distillation and strategic ensemble.

Co-authors

Venues

Fix author