Chao Huang
Other people with similar names: Chao Huang
Unverified author pages with similar names: Chao Huang
2025
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
You Li | Heyu Huang | Chi Chen | Kaiyu Huang | Chao Huang | Zonghao Guo | Zhiyuan Liu | Jinan Xu | Yuhua Li | Ruixuan Li | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
You Li | Heyu Huang | Chi Chen | Kaiyu Huang | Chao Huang | Zonghao Guo | Zhiyuan Liu | Jinan Xu | Yuhua Li | Ruixuan Li | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images. However, existing MLLMs still face challenges in achieving precise grounding in complex multi-image scenarios. To address this, we first explore a Chain-of-Thought (CoT) framework that integrates single-image grounding with multi-image comprehension. While partially effective, it remains unstable and struggles to capture abstract visual information due to its non-end-to-end nature. Therefore, we introduce Migician, the first multi-image grounding model capable of performing free-form and accurate grounding across multiple images. To support this, we present the MGrounding-630k dataset, which comprises data for several multi-image grounding tasks derived from existing datasets, along with newly generated free-form grounding instruction-following data. Furthermore, we propose MIG-Bench, a comprehensive benchmark specifically designed for evaluating multi-image grounding capabilities. Experimental results demonstrate that our model achieves significantly superior multi-image grounding capabilities, outperforming the best existing MLLMs by 24.94% and even surpassing much larger 70B models. Our code, model, dataset, and benchmark are fully open-sourced at https://migician-vg.github.io/.
Boosting Data Utilization for Multilingual Dense Retrieval
Chao Huang | Fengran Mo | Yufeng Chen | Changhao Guan | Zhenrui Yue | Xinyu Wang | Jinan Xu | Kaiyu Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Chao Huang | Fengran Mo | Yufeng Chen | Changhao Guan | Zhenrui Yue | Xinyu Wang | Jinan Xu | Kaiyu Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Multilingual dense retrieval aims to retrieve relevant documents across different languages based on a unified retriever model. The challenge lies in aligning representations of different languages in a shared vector space. The common practice is to fine-tune the dense retriever via contrastive learning, whose effectiveness highly relies on the quality of the negative sample and the efficacy of mini-batch data. Different from the existing studies that focus on developing sophisticated model architecture, we propose a method to boost data utilization for multilingual dense retrieval by obtaining high-quality hard negative samples and effective mini-batch data. The extensive experimental results on a multilingual retrieval benchmark, MIRACL, with 16 languages demonstrate the effectiveness of our method by outperforming several existing strong baselines.
Multi-Stage LLM Fine-Tuning with a Continual Learning Setting
Changhao Guan | Chao Huang | Hongliang Li | You Li | Ning Cheng | Zihe Liu | Yufeng Chen | Jinan Xu | Jian Liu
Findings of the Association for Computational Linguistics: NAACL 2025
Changhao Guan | Chao Huang | Hongliang Li | You Li | Ning Cheng | Zihe Liu | Yufeng Chen | Jinan Xu | Jian Liu
Findings of the Association for Computational Linguistics: NAACL 2025
In recent years, large language models (LLMs) have made significant progress in knowledge-intensive applications. However, when adapting them to specific domains, we may encounter a multi-stage continuous learning scenario, especially in cases where domain knowledge evolves rapidly.This issue severely limits traditional fine-tuning approaches for LLMs.To overcome this limitation, we propose a new learning paradigm designed specifically for multi-stage continuous learning. This paradigm includes a preference-based learning bias to identify potential knowledge conflicts, as well as a self-distillation-based data augmentation strategy to expand and enrich the training corpus, thereby improving the integration of knowledge-compatible information.In the experiments, we show that our proposed method achieves a significant improvement in accuracy after 7 stages of fine-tuning compared to previous methods, while also demonstrating excellent performance in preserving general knowledge.We have released our code and dataset at Multi-Stage-Learning.