Henry Hengyuan Zhao


2025

pdf bib
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback
Henry Hengyuan Zhao | Wenqi Pei | Yifei Tao | Haiyang Mei | Mike Zheng Shou
Findings of the Association for Computational Linguistics: EMNLP 2025

Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users which is vital for developing general-purpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench that evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-3.5-Sonnet. Our evaluation results show that state-of-the-art LMM (e.g., OpenAI-o1) can correct their results through human feedback less than 50%. Our findings point to the need for methods that can enhance LMMs’ capabilities to interpret and benefit from feedback.

pdf bib
Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models
Wenqi Pei | Hailing Xu | Henry Hengyuan Zhao | Shizheng Hou | Chen Han | Zining Zhang | Luo Pingyi | Bingsheng He
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Natural Language to SQL (NL2SQL) has seen significant advancements with large language models (LLMs). However, these models often depend on closed-source methods and high computational resources, posing challenges in data privacy and deployment. In contrast, small language models (SLMs) struggle with NL2SQL tasks, exhibiting poor performance and incompatibility with existing frameworks. To address these issues, we introduce Feather-SQL, a new lightweight framework tailored for SLMs. Feather-SQL improves SQL executability and accuracy through: (i) schema pruning and linking, (ii) multi-path and multi-candidate generation. Additionally, we introduce 1+1 Model Collaboration Paradigm, which pairs a strong general-purpose chat model with a fine-tuned SQL model, combining strong analytical reasoning with high-precision SQL generation. Experimental results on BIRD demonstrate that Feather-SQL improves NL2SQL performance on SLMs, with around 10% boost for models without fine-tuning. The proposed paradigm raises the accuracy ceiling of SLMs to 54.76%, highlighting its effectiveness.