Yiguan Lin
2026
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios
Bin Xu | Yu Bai | Huashan Sun | Yiguan Lin | Siming Liu | Xinyue Liang | Yaolin Li | Zhuangzhi Dong | Jingren Zhang | Yufan Deng | Xinyu Zou | Yang Gao | Heyan Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bin Xu | Yu Bai | Huashan Sun | Yiguan Lin | Siming Liu | Xinyue Liang | Yaolin Li | Zhuangzhi Dong | Jingren Zhang | Yufan Deng | Xinyu Zou | Yang Gao | Heyan Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As large language models continue to advance, their application in educational contexts remains underexplored and under-optimized. In this paper, we address this gap by introducing the first diverse benchmark tailored for educational scenarios, incorporating synthetic data containing 9 major scenarios and over 4,000 distinct educational contexts. To enable comprehensive assessment, we propose a set of multi-dimensional evaluation metrics that cover 12 critical aspects relevant to both teachers and students. We further apply human annotation to ensure the effectiveness of the model-generated evaluation responses. Additionally, we succeed to train a relatively small-scale model on our constructed dataset and demonstrate that it can achieve performance comparable to state-of-the-art large models (e.g., Deepseek V3, Qwen Max) on the test set. Overall, this work provides a practical foundation for the development and evaluation of education-oriented language models.
2024
Fundamental Capabilities of Large Language Models and their Applications in Domain Scenarios: A Survey
Jiawei Li | Yizhe Yang | Yu Bai | Xiaofeng Zhou | Yinghao Li | Huashan Sun | Yuhang Liu | Xingpeng Si | Yuhao Ye | Yixiao Wu | Yiguan Lin | Bin Xu | Bowen Ren | Chong Feng | Yang Gao | Heyan Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiawei Li | Yizhe Yang | Yu Bai | Xiaofeng Zhou | Yinghao Li | Huashan Sun | Yuhang Liu | Xingpeng Si | Yuhao Ye | Yixiao Wu | Yiguan Lin | Bin Xu | Bowen Ren | Chong Feng | Yang Gao | Heyan Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) demonstrate significant value in domain-specific applications, benefiting from their fundamental capabilities. Nevertheless, it is still unclear which fundamental capabilities contribute to success in specific domains. Moreover, the existing benchmark-based evaluation cannot effectively reflect the performance of real-world applications. In this survey, we review recent advances of LLMs in domain applications, aiming to summarize the fundamental capabilities and their collaboration. Furthermore, we establish connections between fundamental capabilities and specific domains, evaluating the varying importance of different capabilities. Based on our findings, we propose a reliable strategy for domains to choose more robust backbone LLMs for real-world applications.