Abstract
“Recent efforts have evaluated large language models (LLMs) in areas such as com-monsense reasoning, mathematical reasoning, and code generation. However, to thebest of our knowledge, no work has specifically investigated the performance of LLMsin natural language generation (NLG) tasks, a pivotal criterion for determining modelexcellence. Thus, this paper conducts a comprehensive evaluation of well-known andhigh-performing LLMs, namely ChatGPT, ChatGLM, T5-based models, LLaMA-basedmodels, and Pythia-based models, in the context of NLG tasks. We select English andChinese datasets encompassing Dialogue Generation and Text Summarization. More-over, we propose a common evaluation setting that incorporates input templates andpost-processing strategies. Our study reports both automatic results, accompanied by adetailed analysis.”- Anthology ID:
- 2023.ccl-2.4
- Original:
- 2023.ccl-2.4v1
- Version 2:
- 2023.ccl-2.4v2
- Volume:
- Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)
- Month:
- August
- Year:
- 2023
- Address:
- Harbin, China
- Editor:
- Jiajun Zhang
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 40–56
- Language:
- English
- URL:
- https://aclanthology.org/2023.ccl-2.4
- DOI:
- Cite (ACL):
- Ni Xuanfan and Li Piji. 2023. A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum), pages 40–56, Harbin, China. Chinese Information Processing Society of China.
- Cite (Informal):
- A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks (Xuanfan & Piji, CCL 2023)
- PDF:
- https://preview.aclanthology.org/corrections-2024-05/2023.ccl-2.4.pdf