Yingming Gao

Also published as: 迎明


2025

pdf bib
Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis
Huijun Lian | Zekai Sun | Keqi Chen | Yingming Gao | Ya Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Commonsense question answering (QA) are widely used to evaluate the commonsense abilities of large language models. However, answering commonsense questions correctly requires not only knowledge but also reasoning—even for seemingly simple questions. We demonstrate that such hidden reasoning attributes in commonsense questions can lead evaluation accuracy differences of up to 24.8% across different difficulty levels in the same benchmark. Current benchmarks overlook these hidden reasoning attributes, making it difficult to assess a model’s specific levels of commonsense knowledge and reasoning ability. To address this issue, we introduce ReComSBench, a novel framework that reveals hidden reasoning attributes behind commonsense questions by leveraging the knowledge generated during the reasoning process. Additionally, ReComSBench proposes three new metrics for decoupled evaluation: Knowledge Balanced Accuracy, Marginal Sampling Gain, and Knowledge Coverage Ratio. Experiments show that ReComSBench provides insights into model performance that traditional benchmarks cannot offer. The difficulty stratification based on revealed hidden reasoning attributes performs as effectively as the model-probability-based approach but is more generalizable and better suited for improving a model’s commonsense reasoning abilities. By uncovering and analyzing the hidden reasoning attributes in commonsense data, ReComSBench offers a new approach to enhancing existing commonsense benchmarks.

2022

pdf bib
基于熵的二语语音习得评价研究—以日本学习者习得汉语声母为例(An Entropy-based Evaluation of L2 Speech Acquisition: The Preliminary Report on Chinese Initials Produced by Japanese Learners)
Xiaoli Feng (冯晓莉) | Yingming Gao (高迎明) | Binghuai Lin (林炳怀) | Jinson Zhang (张劲松)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“本文引入“熵”对学习者二语音素发音错误的分布情况进行了量化研究。通过对不同音素及不同二语水平学习者音素错误率和错误分散度的分析发现:1.错误率与错误分散度有较高的相关性,二者的差异反映出错误分布的差异性;2.错误率类似的音素中,与母语音素相似度越高的音素错误分散度越小;3.较初级水平,中级水平学习者音素错误率下降而错误分散度上升。由此可见,熵可以在错误率基础上可以进一步揭示学习者母语音系及二语水平对音素发音错误分散度的影响。”