Abstract
In this report, we share our contribution to the Eval4NLP Shared Task titled “Prompting Large Language Models as Explainable Metrics.” We build our prompts with a primary focus on effective prompting strategies, score-aggregation, and explainability for LLM-based metrics. We participated in the track for smaller models by submitting the scores along with their explanations. According to the Kendall correlation scores on the leaderboard, our MT evaluation submission ranks second-best, while our summarization evaluation submission ranks fourth, with only a 0.06 difference from the leading submission.- Anthology ID:
- 2023.eval4nlp-1.13
- Volume:
- Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems
- Month:
- November
- Year:
- 2023
- Address:
- Bali, Indonesia
- Editors:
- Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao, Christoph Leiter, Juri Opitz, Andreas Rücklé
- Venues:
- Eval4NLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 156–163
- Language:
- URL:
- https://aclanthology.org/2023.eval4nlp-1.13
- DOI:
- 10.18653/v1/2023.eval4nlp-1.13
- Cite (ACL):
- Pavan Baswani, Ananya Mukherjee, and Manish Shrivastava. 2023. LTRC_IIITH’s 2023 Submission for Prompting Large Language Models as Explainable Metrics Task. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, pages 156–163, Bali, Indonesia. Association for Computational Linguistics.
- Cite (Informal):
- LTRC_IIITH’s 2023 Submission for Prompting Large Language Models as Explainable Metrics Task (Baswani et al., Eval4NLP-WS 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.eval4nlp-1.13.pdf