Ming Hu
2026
SPIDE: Serial and Parallel Intertwined Speculative Decoding
Wenru Xu | Peixuan Xu | Ziqi Yang | Ming Hu | Zihui Wang | Jianzhong Qi | Rongshan Yu | Xiaoliang Fan | Cheng Wang
Findings of the Association for Computational Linguistics: ACL 2026
Wenru Xu | Peixuan Xu | Ziqi Yang | Ming Hu | Zihui Wang | Jianzhong Qi | Rongshan Yu | Xiaoliang Fan | Cheng Wang
Findings of the Association for Computational Linguistics: ACL 2026
Speculative Decoding (SD) reduces inference latency for Large Language Models (LLMs) by leveraging an efficient draft model to generate candidate tokens, which are subsequently verified by the target model. To enhance acceleration while reducing the LLM usage costs, we propose Serial and Parallel Intertwined Speculative DEcoding (SPIDE) — a novel training-free SD framework that orchestrates dynamic alternation combining serial dynamic drafting with parallel draft verification. We maintain a confidence-acceptance mapping table during the decoding process. In the serial dynamic drafting module, we leverage this table to evaluate the reliability of the draft sequence and adjust draft lengths adaptively. In the parallel draft verification module, we alleviate drafting-termination conflicts that compromise efficiency, and we update the mapping table synchronously. We conduct experimental evaluations on diverse model pairs and text generation tasks to assess the effectiveness of SPIDE. Compared with autoregressive decoding, SPIDE is speeded up by 3.25× on average and up to 4.56×. Compared with vanilla SD, SPIDE only increases the LLM usage cost by 8.2% on average, but brings an additional 67.7% speedup on average.
TAGS: A Test-Time Generalist–Specialist Framework with Retrieval-Augmented Reasoning and Verification
Jianghao Wu | Feilong Tang | Yulong Li | Ming Hu | Haochen Xue | Shoaib Jameel | Zongyuan Ge | Yutong Xie | Imran Razzak
Findings of the Association for Computational Linguistics: ACL 2026
Jianghao Wu | Feilong Tang | Yulong Li | Ming Hu | Haochen Xue | Shoaib Jameel | Zongyuan Ge | Yutong Xie | Imran Razzak
Findings of the Association for Computational Linguistics: ACL 2026
Recent advances such as Chain-of-Thought prompting have significantly improved large language models (LLMs) in zero-shot medical reasoning. However, prompting-based methods often remain shallow and unstable, while fine-tuned medical LLMs suffer from poor generalization under distribution shifts and limited adaptability to unseen clinical scenarios. To address these limitations, we present TAGS, a test-time framework that combines a broadly capable generalist with a domain-specific specialist to offer complementary perspectives without any model fine-tuning or parameter updates. To support this generalist–specialist reasoning process, we introduce two auxiliary modules: a hierarchical retrieval mechanism that provides multi-scale exemplars by selecting examples based on both semantic and rationale-level similarity, and a reliability scorer that evaluates reasoning consistency to guide final answer aggregation. TAGS achieves strong performance across nine MedQA benchmarks, boosting GPT-4o accuracy by 13.8%, DeepSeek-R1 by 16.8%, and improving a vanilla 7B model from 14.1% to 23.9%. These results surpass several fine-tuned medical LLMs, without any parameter updates.
2025
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Peng Xia | Xingtong Yu | Ming Hu | Lie Ju | Zhiyong Wang | Peibo Duan | Zongyuan Ge
Proceedings of the 31st International Conference on Computational Linguistics
Peng Xia | Xingtong Yu | Ming Hu | Lie Ju | Zhiyong Wang | Peibo Duan | Zongyuan Ge
Proceedings of the 31st International Conference on Computational Linguistics
Object categories are typically organized into a multi-granularity taxonomic hierarchy. When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios. Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships. These efforts are constrained by their inability to perform effectively across varied granularity of categories. To tackle this issue, we propose a novel framework (**HGCLIP**) that effectively combines **CLIP** with a deeper exploitation of the **H**ierarchical class structure via **G**raph representation learning. We explore constructing the class hierarchy into a graph, with its nodes representing the textual or image features of each category. After passing through a graph encoder, the textual features incorporate hierarchical structure information, while the image features emphasize class-aware features derived from prototypes through the attention mechanism. Our approach demonstrates significant improvements on 11 diverse visual recognition benchmarks. Our codes are fully available at https: //github.com/richard-peng-xia/HGCLIP.
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
Haochen Xue | Feilong Tang | Ming Hu | Yexin Liu | Qidong Huang | Yulong Li | Chengzhi Liu | Zhongxing Xu | Chong Zhang | Chun-Mei Feng | Yutong Xie | Imran Razzak | Zongyuan Ge | Jionglong Su | Junjun He | Yu Qiao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haochen Xue | Feilong Tang | Ming Hu | Yexin Liu | Qidong Huang | Yulong Li | Chengzhi Liu | Zhongxing Xu | Chong Zhang | Chun-Mei Feng | Yutong Xie | Imran Razzak | Zongyuan Ge | Jionglong Su | Junjun He | Yu Qiao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation, inadequacies in updating factual knowledge, accumulated assumption of error propagation, and reluctance to “say no.” To mitigate these issues, we propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses, enhancing conversational capabilities. Experiments across six MLLMs demonstrate significant performance improvements.
2024
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-Tailed Multi-Label Visual Recognition
Peng Xia | Di Xu | Ming Hu | Lie Ju | Zongyuan Ge
Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR)
Peng Xia | Di Xu | Ming Hu | Lie Ju | Zongyuan Ge
Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR)
Long-tailed multi-label visual recognition (LTML) task is a highly challenging task due to the label co-occurrence and imbalanced data distribution. In this work, we propose a unified framework for LTML, namely prompt tuning with class-specific embedding loss (LMPT), capturing the semantic feature interactions between categories by combining text and image modality data and improving the performance synchronously on both head and tail classes. Specifically, LMPT introduces the embedding loss function with class-aware soft margin and re-weighting to learn class-specific contexts with the benefit of textual descriptions (captions), which could help establish semantic relationships between classes, especially between the head and tail classes. Furthermore, taking into account the class imbalance, the distribution-balanced loss is adopted as the classification loss function to further improve the performance on the tail classes without compromising head classes. Extensive experiments are conducted on VOC-LT and COCO-LT datasets, which demonstrates that our method significantly surpasses the previous state-of-the-art methods and zero-shot CLIP in LTML. Our codes are fully public at https://github.com/richard-peng-xia/LMPT.
2018
A Bilingual Interactive Human Avatar Dialogue System
Dana Abu Ali | Muaz Ahmad | Hayat Al Hassan | Paula Dozsa | Ming Hu | Jose Varias | Nizar Habash
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Dana Abu Ali | Muaz Ahmad | Hayat Al Hassan | Paula Dozsa | Ming Hu | Jose Varias | Nizar Habash
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
This demonstration paper presents a bilingual (Arabic-English) interactive human avatar dialogue system. The system is named TOIA (time-offset interaction application), as it simulates face-to-face conversations between humans using digital human avatars recorded in the past. TOIA is a conversational agent, similar to a chat bot, except that it is based on an actual human being and can be used to preserve and tell stories. The system is designed to allow anybody, simply using a laptop, to create an avatar of themselves, thus facilitating cross-cultural and cross-generational sharing of narratives to wider audiences. The system currently supports monolingual and cross-lingual dialogues in Arabic and English, but can be extended to other languages.
Search
Fix author
Co-authors
- Zongyuan Ge 4
- Lie Ju 2
- Yulong Li 2
- Imran Razzak 2
- Feilong Tang 2
- Peng Xia 2
- Yutong Xie 2
- Haochen Xue 2
- Dana Abu Ali 1
- Muaz Ahmad 1
- Hayat Al Hassan 1
- Paula Dozsa 1
- Peibo Duan 1
- Xiaoliang Fan 1
- Chun-Mei Feng 1
- Nizar Habash 1
- Junjun He 1
- Qidong Huang 1
- Shoaib Jameel 1
- Yexin Liu 1
- Chengzhi Liu 1
- Jianzhong Qi 1
- Yu Qiao 1
- Jionglong Su 1
- Jose Varias 1
- Zhiyong Wang 1
- Zihui Wang 1
- Cheng Wang 1
- Jianghao Wu 1
- Di Xu 1
- Zhongxing Xu 1
- Wenru Xu 1
- Peixuan Xu 1
- Ziqi Yang 1
- Xingtong Yu 1
- Rongshan Yu 1
- Chong Zhang 1