Infrrd.ai at SemEval-2024 Task 7: RAG-based end-to-end training to generate headlines and numbers

Jianglong He, Saiteja Tallam, Srirama Nakshathri, Navaneeth Amarnath, Pratiba Kr, Deepak Kumar


Abstract
We propose a training algorithm based on retrieval-augmented generation (RAG) to obtain the most similar training samples. The training samples obtained are used as a reference to perform contextual learning-based fine-tuning of large language models (LLMs). We use the proposed method to generate headlines and extract numerical values from unstructured text. Models are made aware of the presence of numbers in the unstructured text with extended markup language (XML) tags specifically designed to capture the numbers. The headlines of unstructured text are preprocessed to wrap the number and then presented to the model. A number of mathematical operations are also passed as references to cover the chain-of-thought (COT) approach. Therefore, the model can calculate the final value passed to a mathematical operation. We perform the validation of numbers as a post-processing step to verify whether the numerical value calculated by the model is correct or not. The automatic validation of numbers in the generated headline helped the model achieve the best results in human evaluation among the methods involved.
Anthology ID:
2024.semeval-1.136
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
940–951
Language:
URL:
https://aclanthology.org/2024.semeval-1.136
DOI:
Bibkey:
Cite (ACL):
Jianglong He, Saiteja Tallam, Srirama Nakshathri, Navaneeth Amarnath, Pratiba Kr, and Deepak Kumar. 2024. Infrrd.ai at SemEval-2024 Task 7: RAG-based end-to-end training to generate headlines and numbers. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 940–951, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Infrrd.ai at SemEval-2024 Task 7: RAG-based end-to-end training to generate headlines and numbers (He et al., SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-checklist/2024.semeval-1.136.pdf
Supplementary material:
 2024.semeval-1.136.SupplementaryMaterial.txt