Qing Li
Other people with similar names: Qing Li, Qing Li, Qing Li, Qing Li
Unverified author pages with similar names: Qing Li
2026
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
Jiahui Geng | Fengyu Cai | Shaobo Cui | Qing Li | Liangwei Chen | Chenyang Lyu | Haonan Li | Derui Zhu | Alexander Pretschner | Heinz Koeppl | Fakhri Karray
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiahui Geng | Fengyu Cai | Shaobo Cui | Qing Li | Liangwei Chen | Chenyang Lyu | Haonan Li | Derui Zhu | Alexander Pretschner | Heinz Koeppl | Fakhri Karray
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Code retrieval is vital to modern software engineering as it boosts reuse and speeds up debugging. However, current benchmarks primarily emphasize functional relevance while neglecting code quality. To address this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across four critical dimensions: correctness, efficiency, security, and maintainability. CoQuIR includes fine-grained quality annotations over 42,725 queries and 134,907 code snippets in 11 programming languages. Evaluating 23 retrievers (both open-source and proprietary) shows that even state-of-the-art models often fail to separate buggy or insecure code from robust counterparts. We further investigate methods for explicitly training retrievers to recognize code quality, demonstrating that quality-aware metrics can be improved without loss of semantic relevance; downstream code generation benefits from these gains. CoQuIR underscores the importance of embedding quality signals into retrieval systems as a crucial component for more trustworthy developer tools.
2025
Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language
Bo Zeng | Chenyang Lyu | Sinuo Liu | Mingyan Zeng | Minghao Wu | Xuanfan Ni | Tianqi Shi | Yu Zhao | Yefeng Liu | Chenyu Zhu | Ruizhe Li | Jiahui Geng | Qing Li | Yu Tong | Longyue Wang | Weihua Luo | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Zeng | Chenyang Lyu | Sinuo Liu | Mingyan Zeng | Minghao Wu | Xuanfan Ni | Tianqi Shi | Yu Zhao | Yefeng Liu | Chenyu Zhu | Ruizhe Li | Jiahui Geng | Qing Li | Yu Tong | Longyue Wang | Weihua Luo | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Instruction-following capability has become a major ability to be evaluated for Large Language Models. However, existing datasets, such as IFEval, are either predominantly monolingual and centered on English or simply machine translated to other languages, limiting their applicability in multilingual contexts. In this paper, we present an carefully-curated extension of IFEval to a localized multilingual version named Marco-Bench-MIF, covering 30 languages with varying levels of localization. Our benchmark addresses linguistic constraints (e.g., modifying capitalization requirements for Chinese) and cultural references (e.g., substituting region-specific company names in prompts) via a hybrid pipeline combining translation with verification. Through comprehensive evaluation of 20+ LLMs on our Marco-Bench-MIF, we found that: (1) 25-35% accuracy gap between high/low-resource languages, (2) model scales largely impact performance by 45-60% yet persists script-specific challenges, and (3) machine-translated data underestimates accuracy by 7-22% versus localized data. Our analysis identifies challenges in multilingual instruction following, including keyword consistency preservation and compositional constraint adherence across languages. Our Marco-Bench-MIF will be made publicly available to the community.
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Jiahui Geng | Qing Li | Zongxiong Chen | Yuxia Wang | Derui Zhu | Zhuohan Xie | Chenyang Lyu | Xiuying Chen | Preslav Nakov | Fakhri Karray
Findings of the Association for Computational Linguistics: ACL 2025
Jiahui Geng | Qing Li | Zongxiong Chen | Yuxia Wang | Derui Zhu | Zhuohan Xie | Chenyang Lyu | Xiuying Chen | Preslav Nakov | Fakhri Karray
Findings of the Association for Computational Linguistics: ACL 2025
The rapid advancement of vision-language models (VLMs) has brought a lot of attention to their safety alignment. However, existing methods have primarily focused on model undersafety, where the model responds to hazardous queries, while neglecting oversafety, where the model refuses to answer safe queries. In this paper, we introduce the concept of safety calibration, which systematically addresses both undersafety and oversafety. Specifically, we present VSCBench, a novel dataset of 3,600 image-text pairs that are visually or textually similar but differ in terms of safety, which is designed to evaluate safety calibration across image-centric and text-centric scenarios. Based on our benchmark, we evaluate safety calibration across eleven widely used VLMs. Our extensive experiments revealed major issues with both undersafety and oversafety. We further investigated four approaches to improve the model’s safety calibration. We found that even though some methods effectively calibrated the models’ safety problems, these methods also lead to the degradation of models’ utility. This trade-off underscores the urgent need for advanced calibration methods, and our benchmark provides a valuable tool for evaluating future approaches.
Explicit and Implicit Data Augmentation for Social Event Detection
Congbo Ma | Yuxia Wang | Jia Wu | Jian Yang | Jing Du | Zitai Qiu | Qing Li | Hu Wang | Preslav Nakov
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Congbo Ma | Yuxia Wang | Jia Wu | Jian Yang | Jing Du | Zitai Qiu | Qing Li | Hu Wang | Preslav Nakov
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Social event detection involves identifying and categorizing important events from social media, which relies on labeled data, but annotation is costly and labor-intensive. To address this problem, we propose Augmentation framework for Social Event Detection (SED-Aug), a plug-and-play dual augmentation framework, which combines explicit text-based and implicit feature-space augmentation to enhance data diversity and model robustness. The explicit augmentation utilizes LLMs to enhance textual information through five diverse generation strategies. For implicit augmentation, we design five novel perturbation techniques that operate in the feature space on structural fused embeddings. These perturbations are crafted to keep the semantic and relational properties of the embeddings and make them more diverse. Specifically, SED-Aug outperforms the best baseline model by approximately 17.67% on the Twitter2012 dataset and by about 15.57% on the Twitter2018 dataset in terms of the average F1 score.
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
Qing Li | Jiahui Geng | Zongxiong Chen | Derui Zhu | Yuxia Wang | Congbo Ma | Chenyang Lyu | Fakhri Karray
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Qing Li | Jiahui Geng | Zongxiong Chen | Derui Zhu | Yuxia Wang | Congbo Ma | Chenyang Lyu | Fakhri Karray
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In recent years, large language models (LLMs) have made remarkable advancements, yet hallucination, where models produce inaccurate or non-factual statements, remains a significant challenge for real-world deployment. Although current classification-based methods, such as SAPLMA, are highly efficient in mitigating hallucinations, they struggle when non-factual information arises in the early or mid-sequence of outputs, reducing their reliability. To address these issues, we propose Hallucination Detection-Neural Differential Equations (HD-NDEs), a novel method that systematically assesses the truthfulness of statements by capturing the full dynamics of LLMs within their latent space. Our approaches apply neural differential equations (Neural DEs) to model the dynamic system in the latent space of LLMs. Then, the sequence in the latent space is mapped to the classification space for truth assessment. The extensive experiments across five datasets and six widely used LLMs demonstrate the effectiveness of HD-NDEs, especially, achieving over 14% improvement in AUC-ROC on the True-False dataset compared to state-of-the-art techniques.
2024
Reference-free Hallucination Detection for Large Vision-Language Models
Qing Li | Jiahui Geng | Chenyang Lyu | Derui Zhu | Maxim Panov | Fakhri Karray
Findings of the Association for Computational Linguistics: EMNLP 2024
Qing Li | Jiahui Geng | Chenyang Lyu | Derui Zhu | Maxim Panov | Fakhri Karray
Findings of the Association for Computational Linguistics: EMNLP 2024
Large vision-language models (LVLMs) have made significant progress in recent years. While LVLMs exhibit excellent ability in language understanding, question answering, and conversations of visual inputs, they are prone to producing hallucinations. While several methods are proposed to evaluate the hallucinations in LVLMs, most are reference-based and depend on external tools, which complicates their practical application. To assess the viability of alternative methods, it is critical to understand whether the reference-free approaches, which do not rely on any external tools, can efficiently detect hallucinations. Therefore, we initiate an exploratory study to demonstrate the effectiveness of different reference-free solutions in detecting hallucinations in LVLMs. In particular, we conduct an extensive study on three kinds of techniques: uncertainty-based, consistency-based, and supervised uncertainty quantification methods on four representative LVLMs across two different tasks. The empirical results show that the reference-free approaches are capable of effectively detecting non-factual responses in LVLMs, with the supervised uncertainty quantification method outperforming the others, achieving the best performance across different settings.
Search
Fix author
Co-authors
- Jiahui Geng 5
- Chenyang Lyu 5
- Fakhri Karray 4
- Derui Zhu 4
- Yuxia Wang 3
- Zongxiong Chen 2
- Congbo Ma 2
- Preslav Nakov 2
- Fengyu Cai 1
- Xiuying Chen 1
- Liangwei Chen 1
- Shaobo Cui 1
- Jing Du 1
- Heinz Koeppl 1
- Ruizhe Li 1
- Haonan Li 1
- Sinuo Liu 1
- Yefeng Liu 1
- Weihua Luo 1
- Xuanfan Ni (倪宣凡) 1
- Maxim Panov 1
- Alexander Pretschner 1
- Zitai Qiu 1
- Tianqi Shi 1
- Yu Tong 1
- Longyue Wang 1
- Hu Wang 1
- Minghao Wu 1
- Jia Wu 1
- Zhuohan Xie 1
- Jian Yang 1
- Bo Zeng 1
- Mingyan Zeng 1
- Kaifu Zhang 1
- Yu Zhao 1
- Chenyu Zhu 1