Long-Form Analogy Evaluation Challenge
Bhavya Bhavya, Chris Palaguachi, Yang Zhou, Suma Bhat, ChengXiang Zhai
Abstract
Given the practical applications of analogies, recent work has studied analogy generation to explain concepts. However, not all generated analogies are of high quality and it is unclear how to measure the quality of this new kind of generated text. To address this challenge, we propose a shared task on automatically evaluating the quality of generated analogies based on seven comprehensive criteria. For this, we will set up a leader board based on our dataset annotated with manual ratings along the seven criteria, and provide a baseline solution leveraging GPT-4. We hope that this task would advance the progress in development of new evaluation metrics and methods for analogy generation in natural language, particularly for education.- Anthology ID:
- 2024.inlg-genchal.1
- Volume:
- Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
- Month:
- September
- Year:
- 2024
- Address:
- Tokyo, Japan
- Editors:
- Simon Mille, Miruna-Adriana Clinciu
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–16
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2024.inlg-genchal.1/
- DOI:
- Cite (ACL):
- Bhavya Bhavya, Chris Palaguachi, Yang Zhou, Suma Bhat, and ChengXiang Zhai. 2024. Long-Form Analogy Evaluation Challenge. In Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges, pages 1–16, Tokyo, Japan. Association for Computational Linguistics.
- Cite (Informal):
- Long-Form Analogy Evaluation Challenge (Bhavya et al., INLG 2024)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2024.inlg-genchal.1.pdf