Long Li
Papers on this page may belong to the following people: Long Li, Long Li
2026
MMAC: A Multilingual, Multimodal Alignment Framework for Cultural Grounding Evaluation
Weihua Zheng | Zhengyuan Liu | Tanmoy Chakraborty | Weiwen Xu | Xiaoxue Gao | Bryan Chen Zhengyu Tan | Bowei Zou | Chang Liu | Yujia Hu | Xing Xie | Xiaoyuan Yi | Jing Yao | Chaojun Wang | Long Li | Rui Liu | Huiyao Liu | Koji Inoue | Ryuichi Sumida | Tatsuya Kawahara | Fan Xu | Lingyu Ye | Wei Tian | Dongjun Kim | Jimin Jung | Jaehyung Seo | Nadya Yuki Wangsajaya | Pham Minh Duc | Ojasva Saxena | Palash Nandi | Xiyan Tao | Wiwik Karlina | Tuan Luong | Keertana Arun Vasan | Roy Ka-Wei Lee | Nancy F. Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weihua Zheng | Zhengyuan Liu | Tanmoy Chakraborty | Weiwen Xu | Xiaoxue Gao | Bryan Chen Zhengyu Tan | Bowei Zou | Chang Liu | Yujia Hu | Xing Xie | Xiaoyuan Yi | Jing Yao | Chaojun Wang | Long Li | Rui Liu | Huiyao Liu | Koji Inoue | Ryuichi Sumida | Tatsuya Kawahara | Fan Xu | Lingyu Ye | Wei Tian | Dongjun Kim | Jimin Jung | Jaehyung Seo | Nadya Yuki Wangsajaya | Pham Minh Duc | Ojasva Saxena | Palash Nandi | Xiyan Tao | Wiwik Karlina | Tuan Luong | Keertana Arun Vasan | Roy Ka-Wei Lee | Nancy F. Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The global deployment of Large Language Models (LLMs) underscores the urgent need to evaluate their cultural alignment. However, assessing genuine "cultural awareness" across modalities (text, vision, speech) and languages remains a significant challenge. To comprehensively investigate this domain, we propose MMAC, a systematic framework that encompasses a tri-modally aligned cultural benchmark creation pipeline and a five-dimensional evaluation protocol to assess cross-country awareness disparities, evaluate cross-lingual and cross-modal consistency, and verify cultural knowledge generalization and grounding validity. Given the prevailing Western cultural bias in current models, we focus on 8 Asian countries as our dataset foundation to more acutely reveal potential cultural deficiencies in LLMs. Our dataset, MMAC-bench, features 27,000 human-curated questions across 10 languages. Crucially, it is the first dataset aligned at the input level across text, image, and speech, enabling direct cross-modal transfer tests. Each question consists of multiple-choice options accompanied by open-ended generated explanations, where 79% require multi-step reasoning grounded in cultural context, moving beyond simple memorization. We probe the causes of modal divergence, offering insights into fostering culturally robust MLLMs.
I²B-LPO: Latent Policy Optimization via Iterative Information Bottleneck
Huilin Deng | Hongchen Luo | Yue Zhu | Long Li | Zhuoyue Chen | Xinghao Zhao | Ming LI | Chuyang Zhao | Jihai Zhang | MengChang Wang | Yang Cao | Yu Kang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Huilin Deng | Hongchen Luo | Yue Zhu | Long Li | Zhuoyue Chen | Xinghao Zhao | Ming LI | Chuyang Zhao | Jihai Zhang | MengChang Wang | Yang Cao | Yu Kang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite recent advances in Reinforcement learning with verifiable rewards (RLVR) for large language model (LLM) reasoning, most methods suffer from exploration collapse, as the semantic homogeneity of random rollouts traps models in narrow, over-optimized behaviors. Existing methods leverage policy entropy to encourage exploration, but face inherent limitations: global entropy regularization is susceptible to reward hacking, inducing meaningless verbosity, whereas local token-selective updates struggle with the strong inductive bias of pre-trained models. To this end, we propose Latent Policy Optimization via Iterative Information Bottleneck ( I²B-LPO), which shifts from statistical perturbation of token distributions to topological branching of reasoning trajectories. I²BLPO triggers latent branching at high-entropy states to diversify reasoning trajectories and applies the Information Bottleneck as a trajectory filter and self-reward to ensure concise and informative exploration. Empirical results on four mathematical benchmarks demonstrate that I²B-LPO achieves state-of-the-art performance, with margins of up to 5.3% in accuracy and 7.4% in diversity metrics. Code is available at https://github.com/denghuilin-cyber/IIB-LPO.
2025
To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
Haozhe Wang | Long Li | Chao Qu | Weidi Xu | Fengming Zhu | Wei Chu | Fangzhen Lin
Findings of the Association for Computational Linguistics: ACL 2025
Haozhe Wang | Long Li | Chao Qu | Weidi Xu | Fengming Zhu | Wei Chu | Fangzhen Lin
Findings of the Association for Computational Linguistics: ACL 2025
Recent advances in mathematical problem-solving with language models (LMs) integrate chain-of-thought (CoT) reasoning and code execution to harness their complementary strengths. However, existing hybrid frameworks exhibit a critical limitation: they depend on externally dictated instructions or rigid code-integration templates, lacking metacognitive awareness—the capacity to dynamically evaluate intrinsic capabilities and autonomously determine when and how to integrate tools. This rigidity motivates our study of autonomous code integration, enabling models to adapt tool-usage strategies as their reasoning abilities evolve during training.While reinforcement learning (RL) shows promise for boosting LLM reasoning at scale (e.g., DeepSeek-R1), we demonstrate its inefficiency in learning autonomous code integration due to inadequate exploration of the vast combinatorial space of CoT-code interleaving patterns. To address this challenge, we propose a novel Expectation-Maximization (EM) framework that synergizes structured exploration (E-step) with off-policy RL optimization (M-step), creating a self-reinforcing cycle between metacognitive tool-use decisions and evolving capabilities. Experiments reveal our method achieves superior results through improved exploration. Notably, our 7B model improves over 11% on MATH500 and 9.4% on AIME without o1-like CoT.
2024
How Do Humans Write Code? Large Models Do It the Same Way Too
Long Li | Xuzheng He | Haozhe Wang | Linlin Wang | Liang He
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Long Li | Xuzheng He | Haozhe Wang | Linlin Wang | Liang He
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models (LLMs) mathematical reasoning tasks by utilizing external tool calls to circumvent computational errors. However, our evaluation of the GPT-4 and Llama series reveals that using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT. To address this issue, we propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT, encompassing: (1) a new generation paradigm that uses full CoT reasoning to control code generation. (2) Focus Attention, that directs model attention to the CoT reasoning during PoT to generate more logical code. (3) reinforcement learning that utilizes the accuracy of both CoT and PoT responses as rewards to prevent repetitive reasoning steps in LLMs when solving difficult math problems. Our method achieves an average improvement of 6.5% on the Llama-Base model and 4.3% on the Mistral-Base model across 8 mathematical calculation datasets. It also shows significant effectiveness on five out-of-domain datasets by controlling the model’s information flow, exhibiting strong transferability. Additionally, HTL shows the most significant improvement in non-mathematical natural language inference task, contributing to a unified reasoning task framework.
Search
Fix author
Co-authors
- Haozhe Wang 2
- Yang Cao 1
- Tanmoy Chakraborty 1
- Nancy Chen 1
- Zhuoyue Chen 1
- Wei Chu 1
- Huilin Deng 1
- Pham Minh Duc 1
- Xiaoxue Gao 1
- Xuzheng He 1
- Liang He 1
- Yujia Hu 1
- Koji Inoue 1
- Jimin Jung 1
- Yu Kang 1
- Wiwik Karlina 1
- Tatsuya Kawahara 1
- Dongjun Kim 1
- Ming LI 1
- Roy Ka-Wei Lee 1
- Fangzhen Lin 1
- Zhengyuan Liu 1
- Chang Liu 1
- Rui Liu 1
- Huiyao Liu 1
- Hongchen Luo 1
- Tuan Luong 1
- Palash Nandi 1
- Chao Qu 1
- Ojasva Saxena 1
- Jaehyung Seo 1
- Ryuichi Sumida 1
- Bryan Chen Zhengyu Tan 1
- Xiyan Tao 1
- Wei Tian 1
- Keertana Arun Vasan 1
- Chaojun Wang 1
- Linlin Wang 1
- MengChang Wang 1
- Nadya Yuki Wangsajaya 1
- Xing Xie 1
- Weiwen Xu 1
- Fan Xu (徐凡) 1
- Weidi Xu 1
- Jing Yao 1
- Lingyu Ye 1
- Xiaoyuan Yi 1
- Jihai Zhang 1
- Xinghao Zhao 1
- Chuyang Zhao 1
- Weihua Zheng 1
- Fengming Zhu 1
- Yue Zhu 1
- Bowei Zou (邹博伟) 1