Infrrd.ai at SemEval-2024 Task 7: RAG-based end-to-end training to generate headlines and numbers
Jianglong He, Saiteja Tallam, Srirama Nakshathri, Navaneeth Amarnath, Pratiba Kr, Deepak Kumar
Abstract
We propose a training algorithm based on retrieval-augmented generation (RAG) to obtain the most similar training samples. The training samples obtained are used as a reference to perform contextual learning-based fine-tuning of large language models (LLMs). We use the proposed method to generate headlines and extract numerical values from unstructured text. Models are made aware of the presence of numbers in the unstructured text with extended markup language (XML) tags specifically designed to capture the numbers. The headlines of unstructured text are preprocessed to wrap the number and then presented to the model. A number of mathematical operations are also passed as references to cover the chain-of-thought (COT) approach. Therefore, the model can calculate the final value passed to a mathematical operation. We perform the validation of numbers as a post-processing step to verify whether the numerical value calculated by the model is correct or not. The automatic validation of numbers in the generated headline helped the model achieve the best results in human evaluation among the methods involved.- Anthology ID:
- 2024.semeval-1.136
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 940–951
- Language:
- URL:
- https://aclanthology.org/2024.semeval-1.136
- DOI:
- 10.18653/v1/2024.semeval-1.136
- Cite (ACL):
- Jianglong He, Saiteja Tallam, Srirama Nakshathri, Navaneeth Amarnath, Pratiba Kr, and Deepak Kumar. 2024. Infrrd.ai at SemEval-2024 Task 7: RAG-based end-to-end training to generate headlines and numbers. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 940–951, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Infrrd.ai at SemEval-2024 Task 7: RAG-based end-to-end training to generate headlines and numbers (He et al., SemEval 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.136.pdf