Jianglin Lu
2026
Revealing the Seen, Imagining the Beyond: A Survey of Image-Grounded Chain-of-Thought Reasoning in Multimodal LLMs
Qihua Dong | Yitian Zhang | Huimin Zeng | Yizhou Wang | Jianglin Lu | Kuo Yang | Yun Fu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Qihua Dong | Yitian Zhang | Huimin Zeng | Yizhou Wang | Jianglin Lu | Kuo Yang | Yun Fu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal large language models (MLLMs) are making rapid strides in complex visual reasoning. This survey synthesizes the emerging paradigm of Image-Grounded Chain-of-Thought (IG-CoT), where models ground intermediate inferences by interleaving textual rationales with visual state updates. We formalize IG-CoT, present a method-centric taxonomy covering prompting, supervised fine-tuning, and reinforcement learning, and map these techniques to representative benchmarks. Our analysis identifies two domains where IG-CoT offers significant advantages: detail-oriented reasoning requiring meticulous perception, and imagined-world reasoning for simulating unseen states in games, geometry, and planning. We discuss the practical trade-offs of current methods regarding controllability, data, and compute. We conclude by highlighting key challenges (efficiency, data quality, and generative capabilities) and outlining promising future directions, including lightweight architectures, richer intermediate supervision, and method-aware evaluations that better assess faithfulness and long-horizon reasoning. We maintain a continuously updated paper list at https://github.com/dddraxxx/Awesome-Image-Grounded-CoT.
ACBQ: Adaptive Cross-Block Quantization of Large Language Models
Hailing Wang | Jianglin Lu | Yitian Zhang | Huimin Zeng | Yun Fu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hailing Wang | Jianglin Lu | Yitian Zhang | Huimin Zeng | Yun Fu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Post-training quantization (PTQ) has emerged as a promising approach for reducing the memory footprint and computational cost of large language models (LLMs), enabling efficient deployment without full model retraining. However, existing PTQ methods struggle to simultaneously support weight–activation joint quantization and extreme low-bit weight quantization. This limitation primarily arises from the depth of LLMs and their strong cross-layer dependencies, which cause quantization errors to propagate and accumulate across layers, ultimately leading to significant performance degradation. In this paper, we present ACBQ, a simple yet effective framework that simultaneously addresses weight–activation joint quantization and extreme weight quantization. We first propose a granular quantization strategy that treats self-attention and FFN as separate quantization units with module-specific optimization objectives. To mitigate the propagation and accumulation of quantization errors across layers, we introduce an adaptive cross-block quantization strategy that explicitly accounts for cross-layer dependencies by encouraging consistency across blocks. Extensive experiments across diverse LLMs, including OPT and the LLaMA family, demonstrate that ACBQ achieves superior performance under both W4A4 and highly aggressive W2 settings, while incurring negligible additional computational overhead.
2025
Representation Potentials of Foundation Models for Multimodal Alignment: A Survey
Jianglin Lu | Hailing Wang | Yi Xu | Yizhou Wang | Kuo Yang | Yun Fu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jianglin Lu | Hailing Wang | Yi Xu | Yizhou Wang | Kuo Yang | Yun Fu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Foundation models learn highly transferable representations through large-scale pretraining on diverse data. An increasing body of research indicates that these representations exhibit a remarkable degree of similarity across architectures and modalities. In this survey, we investigate the representation potentials of foundation models, defined as the latent capacity of their learned representations to capture task-specific information within a single modality while also providing a transferable basis for alignment and unification across modalities. We begin by reviewing representative foundation models and the key metrics that make alignment measurable. We then synthesize empirical evidence of representation potentials from studies in vision, language, speech, multimodality, and neuroscience. The evidence suggests that foundation models often exhibit structural regularities and semantic consistencies in their representation spaces, positioning them as strong candidates for cross-modal transfer and alignment. We further analyze the key factors that foster representation potentials, discuss open questions, and highlight potential challenges.
Unequal Scientific Recognition in the Age of LLMs
Yixuan Liu | Abel Elekes | Jianglin Lu | Rodrigo Dorantes-Gilardi | Albert-Laszlo Barabasi
Findings of the Association for Computational Linguistics: EMNLP 2025
Yixuan Liu | Abel Elekes | Jianglin Lu | Rodrigo Dorantes-Gilardi | Albert-Laszlo Barabasi
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) are reshaping how scientific knowledge is accessed and represented. This study evaluates the extent to which popular and frontier LLMs including GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro recognize scientists, benchmarking their outputs against OpenAlex and Wikipedia. Using a dataset focusing on 100,000 physicists from OpenAlex to evaluate LLM recognition, we uncover substantial disparities: LLMs exhibit selective and inconsistent recognition patterns. Recognition correlates strongly with scholarly impact such as citations, and remains uneven across gender and geography. Women researchers, and researchers from Africa, Asia, and Latin America are significantly underrecognized. We further examine the role of training data provenance, identifying Wikipedia as a potential sources that contributes to recognition gaps. Our findings highlight how LLMs can reflect, and potentially amplify existing disparities in science, underscoring the need for more transparent and inclusive knowledge systems.