Ornait O’Connell
2025
Scaling Up Data-to-Text Generation to Longer Sequences: A New Dataset and Benchmark Results for Generation from Large Triple Sets
Chinonso Cynthia Osuji
|
Simon Mille
|
Ornait O’Connell
|
Thiago Castro Ferreira
|
Anya Belz
|
Brian Davis
Proceedings of the 18th International Natural Language Generation Conference
The ability of LLMs to write coherent, faithful long texts from structured data inputs remains relatively uncharted, in part because nearly all public data-to-text datasets contain only short input-output pairs. To address these gaps, we benchmark six LLMs, a rule‐based system and human-written texts on a new long-input dataset in English and Irish via LLM-based evaluation. We find substantial differences between models and languages.
eSTÓR: Curating Irish Datasets for Machine Translation
Abigail Walsh
|
Órla Ní Loinsigh
|
Jane Adkins
|
Ornait O’Connell
|
Mark Andrade
|
Teresa Clifford
|
Federico Gaspari
|
Jane Dunne
|
Brian Davis
Proceedings of Machine Translation Summit XX: Volume 2
Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.
Search
Fix author
Co-authors
- Brian Davis 2
- Jane Adkins 1
- Mark Andrade 1
- Anja Belz 1
- Thiago Castro Ferreira 1
- show all...