William Soto Martinez

Other people with similar names: William Soto Martinez


2025

pdf bib
Semantic Evaluation of Multilingual Data-to-Text Generation via NLI Fine-Tuning: Precision, Recall and F1 scores
William Soto Martinez | Yannick Parmentier | Claire Gardent
Findings of the Association for Computational Linguistics: ACL 2025

Performance in the KG-to-Text task has improved over the years, particularly in English. However, models are still prone to mistakes like Additions and Omissions. Furthermore, few languages are taken into account since both train and test data are not readily available. In this paper, we hope to facilitate the development and improvement of multilingual KG-to-Text models by providing a multilingual evaluation framework that is reference-less (no need for test data) and permits estimating how much a KG-to-Text Model under- (omission) or over- (addition) generates. We focus on two high (English, Russian) and five low (Breton, Irish, Maltese, Welsh, Xhosa) resource languages and show that our metric has fair to moderate correlation with reference-based metrics, positioning it as a consistent alternative when no references are available. We also show that our metric outperforms prior reference-less metrics in correlation with existing human judgments. Additional human evaluation shows moderate to strong correlation with human annotators in assessing precision and recall at a higher granularity level than shown in previous studies. Since our metric provides scores for precision and recall, it helps better assess the level of over- or under-generation of multilingual KG-to-Text models.

pdf bib
Multilingual Verbalisation of Knowledge Graphs
Yifei Song | William Soto Martinez | Anna Nikiforovskaya | Evan Parker Kelly Chapple | Claire Gardent
Findings of the Association for Computational Linguistics: EMNLP 2025

Most work on Knowledge Graph (KG) verbalisation is monolingual leaving open the question of how to scale KG-to-Text generation to languages with varying amounts of resources. In this work, we explore KG-to-Text generation on nine languages including five high-resource (HR) languages (English, Chinese, French, Spanish, Russian) and four low-resource (LR) languages (Breton, Irish, Maltese, Welsh). We first construct silver multilingual training data for all nine languages and new gold out-of-domain test data for the five HR languages. Using this data and already available in-domain test sets for 7 of our 9 languages, we then compare three strategies: (1) NLG+MT—a state-of-the-art KG-to-English model followed by Machine Translation (MT) into the target language; (2) FTMT—multilingual MT models fine-tuned end-to-end on the silver data; and (3) FewShot—few-shot LLM prompting comparing 4 LLMs. We explore different prompting strategies and show that our best prompting strategy performs the best on all 9 languages, discussing the relative performance of the three approaches on Low vs High Resource languages and on in- vs out-of-domain data.The models, the test set, and the silver training data are available at https://github.com/MeloS7/Multilingual-KG-Verbalisation.