Srinivas Ramesh Kamath
2026
A Comparative Evaluation of End-to-End and Pipeline Approaches for Summarisation
Fahime Same | Saad Mahamood | Srinivas Ramesh Kamath
Proceedings of the 1st Symposium on Natural Language Generation Evaluations
Fahime Same | Saad Mahamood | Srinivas Ramesh Kamath
Proceedings of the 1st Symposium on Natural Language Generation Evaluations
We describe and evaluate two different architectures for creating book highlights from unstructured data. Given the prevalence of large language models, we examine whether a pipeline-based approach with intermediate steps for text generation is still necessary and whether it continues to offer any benefits over an end-to-end approach. Our comparative evaluations using LLM-as-a-judge across multiple models with different parameter sizes and generation scenarios show that highlights generated by the end-to-end approach are preferred. However, there is a slight but consistent increase in faithfulness for the pipeline-generated highlights when generating at a thematic level. Additionally, our analysis across multiple models shows that while larger models are more faithful, the degree of faithfulness increases when they are used with a pipeline architecture. The findings from our work indicate that whilst there is comparability between the two approaches, the greater faithfulness, controllability, and observability of pipeline-based approaches offer tangible benefits in applied settings.
2024
Generating Hotel Highlights from Unstructured Text using LLMs
Srinivas Ramesh Kamath | Fahime Same | Saad Mahamood
Proceedings of the 17th International Natural Language Generation Conference
Srinivas Ramesh Kamath | Fahime Same | Saad Mahamood
Proceedings of the 17th International Natural Language Generation Conference
We describe our implementation and evaluation of the Hotel Highlights system which has been deployed live by trivago. This system leverages a large language model (LLM) to generate a set of highlights from accommodation descriptions and reviews, enabling travellers to quickly understand its unique aspects. In this paper, we discuss our motivation for building this system and the human evaluation we conducted, comparing the generated highlights against the source input to assess the degree of hallucinations and/or contradictions present. Finally, we outline the lessons learned and the improvements needed.