Mark Andrade


2025

Large Language Models (LLMs) have achieved remarkable results in natural language generation, yet challenges remain in data-to-text (D2T) tasks, particularly in controlling output, ensuring transparency, and maintaining factual consistency with the input. We introduce the first LLM-based multi-agent framework for D2T generation, coordinating specialized agents to produce high-quality, interpretable outputs. Our system combines the reasoning and acting abilities of ReAct agents, the self-correction of Reflexion agents, and the quality assurance of Guardrail agents, all directed by an Orchestrator agent that assigns tasks to three specialists—content ordering, text structuring, and surface realization—and iteratively refines outputs based on Guardrail feedback. This closed-loop design enables precise control and dynamic optimization, yielding text that is coherent, accurate, and grounded in the input data. On a relatively simple dataset like WebNLG, our framework performs competitively with end-to-end systems, highlighting its promise for more complex D2T scenarios.
Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.