Somayajulu Sripada

Also published as: Somayajulu G. Sripada, Somayajulu Gowri Sripada, Somayajula G. Sripada


2025

A major concern when deploying LLMs in accuracy-critical domains such as sports reporting is that the generated text may not faithfully reflect the input data. We quantify how input structure affects hallucinations and other factual errors in LLM-generated summaries of NBA play-by-play data, across three formats: row-structured, JSON and unstructured. We manually annotated 3,312 factual errors across 180 game summaries produced by two models, Llama-3.1-70B and Qwen2.5-72B. Input structure has a strong effect: JSON input reduces error rates by 69% for Llama and 65% for Qwen compared to unstructured input, while row-structured input reduces errors by 54% for Llama and 51% for Qwen. A two-way repeated-measures ANOVA shows that input structure accounts for over 80% of the variance in error rates, with Tukey HSD post hoc tests confirming statistically significant differences between all input formats.

2023

Neural data-to-text systems lack the control and factual accuracy required to generate useful and insightful summaries of multidimensional data. We propose a solution in the form of data views, where each view describes an entity and its attributes along specific dimensions. A sequence of views can then be used as a high-level schema for document planning, with the neural model handling the complexities of micro-planning and surface realization. We show that our view-based system retains factual accuracy while offering high-level control of output that can be tailored based on user preference or other norms within the domain.

2022

We report error analysis of outputs from four Table-to-Text generation models fine-tuned on ToTTo, an open-domain English language dataset. We carried out a manual error annotation of a subset of outputs (a total of 3,016 sentences) belonging to the topic of Politics generated by these four models. Our error annotation focused on eight categories of errors. The error analysis shows that more than 46% of sentences from each of the four models have been error-free. It uncovered some of the specific classes of errors; for example, WORD errors (mostly verbs and prepositions) are the dominant errors in all four models and are the most complex ones among other errors. NAME (mostly nouns) and NUMBER errors are slightly higher in two of the GeM benchmark models, whereas DATE-DIMENSION and OTHER categories of errors are more common in our Table-to-Text model. This in-depth error analysis is currently guiding us in improving our Table-to-Text model.

2020

It is unfair to expect neural data-to-text to produce high quality output when there are gaps between system input data and information contained in the training text. Thomson et al. (2020) identify and narrow information gaps in Rotowire, a popular data-to-text dataset. In this paper, we describe a study which finds that a state-of-the-art neural data-to-text system produces higher quality output, according to the information extraction (IE) based metrics, when additional input data is carefully selected from this newly available source. It remains to be shown, however, whether IE metrics used in this study correlate well with humans in judging text quality.

2018

This paper presents a study to understand the issues related to using NLG to humanise explanations from a popular interpretable machine learning framework called LIME. Our study shows that self-reported rating of NLG explanation was higher than that for a non-NLG explanation. However, when tested for comprehension, the results were not as clear-cut showing the need for performing more studies to uncover the factors responsible for high-quality NLG explanations.
This paper proposes an approach to NLG system design which focuses on generating output text which can be more easily processed by the reader. Ways in which cognitive theory might be combined with existing NLG techniques are discussed and two simple experiments in content ordering are presented.

2017

Many data-to-text NLG systems work with data sets which are incomplete, ie some of the data is missing. We have worked with data journalists to understand how they describe incomplete data, and are building NLG algorithms based on these insights. A pilot evaluation showed mixed results, and highlighted several areas where we need to improve our system.

2016

2015

2014

2012

2009

Notre société génère une masse d’information toujours croissante, que ce soit en médecine, en météorologie, etc. La méthode la plus employée pour analyser ces données est de les résumer sous forme graphique. Cependant, il a été démontré qu’un résumé textuel est aussi un mode de présentation efficace. L’objectif du prototype BT-45, développé dans le cadre du projet Babytalk, est de générer des résumés de 45 minutes de signaux physiologiques continus et d’événements temporels discrets en unité néonatale de soins intensifs (NICU). L’article présente l’aspect génération de texte de ce prototype. Une expérimentation clinique a montré que les résumés humains améliorent la prise de décision par rapport à l’approche graphique, tandis que les textes de BT-45 donnent des résultats similaires à l’approche graphique. Une analyse a identifié certaines des limitations de BT-45 mais en dépit de cellesci, notre travail montre qu’il est possible de produire automatiquement des résumés textuels efficaces de données complexes.

2008

2007

2006

2005

2003

2002

2001