Evan Parker Kelly Chapple


2025

pdf bib
Multilingual Verbalisation of Knowledge Graphs
Yifei Song | William Soto Martinez | Anna Nikiforovskaya | Evan Parker Kelly Chapple | Claire Gardent
Findings of the Association for Computational Linguistics: EMNLP 2025

Most work on Knowledge Graph (KG) verbalisation is monolingual leaving open the question of how to scale KG-to-Text generation to languages with varying amounts of resources. In this work, we explore KG-to-Text generation on nine languages including five high-resource (HR) languages (English, Chinese, French, Spanish, Russian) and four low-resource (LR) languages (Breton, Irish, Maltese, Welsh). We first construct silver multilingual training data for all nine languages and new gold out-of-domain test data for the five HR languages. Using this data and already available in-domain test sets for 7 of our 9 languages, we then compare three strategies: (1) NLG+MT—a state-of-the-art KG-to-English model followed by Machine Translation (MT) into the target language; (2) FTMT—multilingual MT models fine-tuned end-to-end on the silver data; and (3) FewShot—few-shot LLM prompting comparing 4 LLMs. We explore different prompting strategies and show that our best prompting strategy performs the best on all 9 languages, discussing the relative performance of the three approaches on Low vs High Resource languages and on in- vs out-of-domain data.The models, the test set, and the silver training data are available at https://github.com/MeloS7/Multilingual-KG-Verbalisation.

pdf bib
Fine-Tuning, Prompting and RAG for Knowledge Graph-to-Russian Text Generation. How do these Methods generalise to Out-of-Distribution Data?
Anna Nikiforovskaya | William Eduardo Soto Martinez | Evan Parker Kelly Chapple | Claire Gardent
Proceedings of the 18th International Natural Language Generation Conference

Prior work on Knowledge Graph-to-Text generation has mostly evaluated models on in-domain test sets and/or with English as the target language. In contrast, we focus on Russian and we assess how various generation methods perform on out-of-domain, unseen data. Previous studies have shown that enriching the input with target-language verbalisations of entities and properties substantially improves the performance of fine-tuned models for Russian. We compare multiple variants of two contemporary paradigms — LLM prompting and Retrieval-Augmented Generation (RAG) — and investigate alternative ways to integrate such external knowledge into the generation process. Using automatic metrics and human evaluation, we find that on unseen data the fine-tuned model consistently underperforms, revealing limited generalisation capacity; that while it outperforms RAG by a small margin on most datasets, prompting generates less fluent text; and conversely, that RAG generates text that is less faithful to the input. Overall, both LLM prompting and RAG outperform Fine-Tuning across all unseen testsets. The code for this paper is available at https://github.com/Javanochka/KG-to-text-fine-tuning-prompting-rag