Ornait O’Connell
2026
LLM Multi-Agent Systems for Long Triple Set Data-to-Text Generation
Chinonso Cynthia Osuji | Simon Mille | Mark Andrade | Jane Adkins | Ornait O’Connell | Elaine Uí Dhonnchadha | Bláithín Heffernan | Fírinne Nic an tSaoir | Anya Belz | Thiago Castro Ferreira | Brian Davis
Findings of the Association for Computational Linguistics: ACL 2026
Chinonso Cynthia Osuji | Simon Mille | Mark Andrade | Jane Adkins | Ornait O’Connell | Elaine Uí Dhonnchadha | Bláithín Heffernan | Fírinne Nic an tSaoir | Anya Belz | Thiago Castro Ferreira | Brian Davis
Findings of the Association for Computational Linguistics: ACL 2026
Generating coherent, semantically accurate text from large structured inputs remains a persistent challenge in data-to-text generation, as single-step LLM mappings from data-to-text limit control over discourse structuring and amplify hallucinations and omissions as input size grows. We introduce a new dataset of extended DBpedia triple sets (up to 199 triples per input), and a modular multi-agent framework: specialised LLM agents handle content ordering, text structuring, and surface realisation under the supervision of an orchestrator and guardrail control loop. The system generates multi-paragraph outputs in English and Irish (low-resource). We compare a three-worker multi-agent configuration against a single-worker multi-task variant and a strong end-to-end baseline. Quality is assessed via human evaluation and LLM-as-a-judge (with truncation-based sanity checks). Results show slightly superior coherence for the multi-agent approach in both languages, with statistically significant inter-rater correlation over all criteria for English and no statistically significant correlation for Irish. Human-LLM alignment is very weak overall, thus exposing key limits in scalable NLG evaluation.
2025
eSTÓR: Curating Irish Datasets for Machine Translation
Abigail Walsh | Órla Ní Loinsigh | Jane Adkins | Ornait O’Connell | Mark Andrade | Teresa Clifford | Federico Gaspari | Jane Dunne | Brian Davis
Proceedings of Machine Translation Summit XX: Volume 2
Abigail Walsh | Órla Ní Loinsigh | Jane Adkins | Ornait O’Connell | Mark Andrade | Teresa Clifford | Federico Gaspari | Jane Dunne | Brian Davis
Proceedings of Machine Translation Summit XX: Volume 2
Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.
Scaling Up Data-to-Text Generation to Longer Sequences: A New Dataset and Benchmark Results for Generation from Large Triple Sets
Chinonso Cynthia Osuji | Simon Mille | Ornait O’Connell | Thiago Castro Ferreira | Anya Belz | Brian Davis
Proceedings of the 18th International Natural Language Generation Conference
Chinonso Cynthia Osuji | Simon Mille | Ornait O’Connell | Thiago Castro Ferreira | Anya Belz | Brian Davis
Proceedings of the 18th International Natural Language Generation Conference
The ability of LLMs to write coherent, faithful long texts from structured data inputs remains relatively uncharted, in part because nearly all public data-to-text datasets contain only short input-output pairs. To address these gaps, we benchmark six LLMs, a rule‐based system and human-written texts on a new long-input dataset in English and Irish via LLM-based evaluation. We find substantial differences between models and languages.