Liangyu Chen
Other people with similar names: Liang-Yu Chen
2026
A Survey of Large Models in Sports
Yichen Xu | Jianzhe Ma | Chuhan Wang | Zhonghao Cao | Liangyu Chen | Wenxuan Wang | Qin Jin
Findings of the Association for Computational Linguistics: ACL 2026
Yichen Xu | Jianzhe Ma | Chuhan Wang | Zhonghao Cao | Liangyu Chen | Wenxuan Wang | Qin Jin
Findings of the Association for Computational Linguistics: ACL 2026
Sports have witnessed growing global enthusiasm in recent years, serving as a vital force for physical health, cultural exchange, social connection, and economic growth. The rapid advancement of large models, particularly (multimodal) large language models (M)LLMs, has demonstrated transformative potential to reshape sports understanding, analysis, and interaction across diverse domains. This paper presents a comprehensive survey of large models in sports, including (i) an overview of tasks and applications across different participant groups; (ii) a detailed analysis of sports-related datasets and benchmarks; and (iii) a critical discussion of current challenges and future directions. Our goal is to establish a foundation for advancing research and practical development of large-model-driven sports intelligence. An open-source GitHub repository is maintained at: https://github.com/Road2Redemption/Awesome_Large_Models_In_Sports1.
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
Quyu Kong | Xu Zhang | Zhenyu Yang | Nolan Gao | Chen Liu | Panrong Tong | Chenglin Cai | Hanzhang Zhou | Jianan Zhang | Liangyu Chen | Zhidan Liu | Steven Hoi | Yue Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Quyu Kong | Xu Zhang | Zhenyu Yang | Nolan Gao | Chen Liu | Panrong Tong | Chenglin Cai | Hanzhang Zhou | Jianan Zhang | Liangyu Chen | Zhidan Liu | Steven Hoi | Yue Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While AndroidWorld has become the dominant mobile-use benchmark due to its reproducible environment and deterministic evaluation, recent agents achieving over 90% success rates indicate saturation and motivate the need for greater challenge. In addition, its environment lacks key application categories, such as e-commerce and enterprise communication, and does not reflect realistic mobile-use scenarios characterized by vague user instructions and hybrid tool usage. We introduce MobileWorld, a substantially more challenging benchmark with 201 tasks across 20 applications that reflects real-world usage through long-horizon, cross-application workflows requiring nearly twice as many steps (27.8 vs. 14.3) and featuring significantly more multi-app tasks (62.2% vs. 9.5%) than AndroidWorld. MobileWorld balances production-grade utility and reproducible evaluation using open-source alternatives to industry standards (e.g., Mattermost for Slack), enabling full observability through source code modification and direct database access. Beyond standard GUI manipulation, MobileWorld introduces novel task categories including agent-user interaction and Model Context Protocol (MCP)-augmented tasks for evaluating agents in user-aware, hybrid-tool scenarios. We develop a planner-executor framework with extended action spaces supporting user interactions and MCP calls. Results show a sharp performance drop from AndroidWorld, with the best agentic framework and end-to-end model achieving 51.7% and 20.9% success rates, respectively, highlighting substantial room for future research.
POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering
Yichen Xu | Liangyu Chen | Liang Zhang | Zihao Yue | Jianzhe Ma | Wenxuan Wang | Qin Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yichen Xu | Liangyu Chen | Liang Zhang | Zihao Yue | Jianzhe Ma | Wenxuan Wang | Qin Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Charts are a universally adopted medium for data communication, yet existing chart understanding benchmarks are overwhelmingly English-centric, limiting their accessibility and relevance to global audiences. To address this limitation, we introduce PolyChartQA, the first large-scale multilingual benchmark for chart question answering, comprising 22,606 charts and 26,151 QA pairs across 10 diverse languages. PolyChartQA is constructed through a scalable pipeline that enables efficient multilingual chart generation via data translation and code reuse, supported by LLM-based translation and rigorous quality control. We systematically evaluate multilingual chart understanding with PolyChartQA on state-of-the-art LVLMs and reveal a significant performance gap between English and other languages, particularly low-resource ones. Additionally, we introduce a companion multilingual chart question answering training set, PolyChartQA-Train, on which fine-tuning LVLMs yields substantial gains in multilingual chart understanding across diverse model sizes and architectures. Together, our benchmark provides a foundation for developing globally inclusive vision-language models capable of understanding charts across diverse linguistic contexts. Codes and datasets are available on https://github.com/Road2Redemption/PolyChartQA.