Laura Perez-Beltrachini


2021

pdf bib
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)
Antoine Bosselut | Esin Durmus | Varun Prashant Gangal | Sebastian Gehrmann | Yacine Jernite | Laura Perez-Beltrachini | Samira Shaikh | Wei Xu
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

pdf bib
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann | Tosin Adewumi | Karmanya Aggarwal | Pawan Sasanka Ammanamanchi | Anuoluwapo Aremu | Antoine Bosselut | Khyathi Raghavi Chandu | Miruna-Adriana Clinciu | Dipanjan Das | Kaustubh Dhole | Wanyu Du | Esin Durmus | Ondřej Dušek | Chris Chinenye Emezue | Varun Gangal | Cristina Garbacea | Tatsunori Hashimoto | Yufang Hou | Yacine Jernite | Harsh Jhamtani | Yangfeng Ji | Shailza Jolly | Mihir Kale | Dhruv Kumar | Faisal Ladhak | Aman Madaan | Mounica Maddela | Khyati Mahajan | Saad Mahamood | Bodhisattwa Prasad Majumder | Pedro Henrique Martins | Angelina McMillan-Major | Simon Mille | Emiel van Miltenburg | Moin Nadeem | Shashi Narayan | Vitaly Nikolaev | Andre Niyongabo Rubungo | Salomey Osei | Ankur Parikh | Laura Perez-Beltrachini | Niranjan Ramesh Rao | Vikas Raunak | Juan Diego Rodriguez | Sashank Santhanam | João Sedoc | Thibault Sellam | Samira Shaikh | Anastasia Shimorina | Marco Antonio Sobrevilla Cabezudo | Hendrik Strobelt | Nishant Subramani | Wei Xu | Diyi Yang | Akhila Yerukola | Jiawei Zhou
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.

pdf bib
Models and Datasets for Cross-Lingual Summarisation
Laura Perez-Beltrachini | Mirella Lapata
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language. The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German, and the methodology for its creation can be applied to several other languages. We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles’ bodies from language aligned Wikipedia titles. We analyse the proposed cross-lingual summarisation task with automatic metrics and validate it with a human study. To illustrate the utility of our dataset we report experiments with multi-lingual pre-trained models in supervised, zero- and few-shot, and out-of-domain scenarios.

2019

pdf bib
Generating Summaries with Topic Templates and Structured Convolutional Decoders
Laura Perez-Beltrachini | Yang Liu | Mirella Lapata
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Existing neural generation approaches create multi-sentence text as a single sequence. In this paper we propose a structured convolutional decoder that is guided by the content structure of target summaries. We compare our model with existing sequential decoders on three data sets representing different domains. Automatic and human evaluation demonstrate that our summaries have better content coverage.

2018

pdf bib
Bootstrapping Generators from Noisy Data
Laura Perez-Beltrachini | Mirella Lapata
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

A core step in statistical data-to-text generation concerns learning correspondences between structured data representations (e.g., facts in a database) and associated texts. In this paper we aim to bootstrap generators from large scale datasets where the data (e.g., DBPedia facts) and related texts (e.g., Wikipedia abstracts) are loosely aligned. We tackle this challenging task by introducing a special-purpose content selection mechanism. We use multi-instance learning to automatically discover correspondences between data and text pairs and show how these can be used to enhance the content signal while training an encoder-decoder architecture. Experimental results demonstrate that models trained with content-specific objectives improve upon a vanilla encoder-decoder which solely relies on soft attention.

pdf bib
Deep Graph Convolutional Encoders for Structured Data to Text Generation
Diego Marcheggiani | Laura Perez-Beltrachini
Proceedings of the 11th International Conference on Natural Language Generation

Most previous work on neural text generation from graph-structured data relies on standard sequence-to-sequence methods. These approaches linearise the input graph to be fed to a recurrent neural network. In this paper, we propose an alternative encoder based on graph convolutional networks that directly exploits the input structure. We report results on two graph-to-sequence datasets that empirically show the benefits of explicitly encoding the input graph structure.

2017

pdf bib
A Statistical, Grammar-Based Approach to Microplanning
Claire Gardent | Laura Perez-Beltrachini
Computational Linguistics, Volume 43, Issue 1 - April 2017

Although there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine-grained interactions that arise during microplanning between aggregation, surface realization, and sentence segmentation. In this article, we propose a hybrid symbolic/statistical approach to jointly model the constraints regulating these interactions. Our approach integrates a small handwritten grammar, a statistical hypertagger, and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid statistic/symbolic system more fluent than both a template-based and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation, and surface realization.

pdf bib
Creating Training Corpora for NLG Micro-Planners
Claire Gardent | Anastasia Shimorina | Shashi Narayan | Laura Perez-Beltrachini
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present a novel framework for semi-automatically creating linguistically challenging micro-planning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB verbalisers. We apply our framework to DBpedia data and compare the resulting dataset with Wen et al. 2016’s. We show that while Wen et al.’s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation. To encourage researchers to take up this challenge, we made available a dataset of 21,855 data/text pairs created using this framework in the context of the WebNLG shared task.

pdf bib
The WebNLG Challenge: Generating Text from RDF Data
Claire Gardent | Anastasia Shimorina | Shashi Narayan | Laura Perez-Beltrachini
Proceedings of the 10th International Conference on Natural Language Generation

The WebNLG challenge consists in mapping sets of RDF triples to text. It provides a common benchmark on which to train, evaluate and compare “microplanners”, i.e. generation systems that verbalise a given content by making a range of complex interacting choices including referring expression generation, aggregation, lexicalisation, surface realisation and sentence segmentation. In this paper, we introduce the microplanning task, describe data preparation, introduce our evaluation methodology, analyse participant results and provide a brief description of the participating systems.

pdf bib
Analysing Data-To-Text Generation Benchmarks
Laura Perez-Beltrachini | Claire Gardent
Proceedings of the 10th International Conference on Natural Language Generation

A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.

2016

pdf bib
Content selection as semantic-based ontology exploration
Laura Perez-Beltrachini | Claire Gardent | Anselme Revuz | Saptarashmi Bandyopadhyay
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

pdf bib
Category-Driven Content Selection
Rania Mohammed | Laura Perez-Beltrachini | Claire Gardent
Proceedings of the 9th International Natural Language Generation conference

pdf bib
The WebNLG Challenge: Generating Text from DBPedia Data
Emilie Colin | Claire Gardent | Yassine M’rabet | Shashi Narayan | Laura Perez-Beltrachini
Proceedings of the 9th International Natural Language Generation conference

pdf bib
Building RDF Content for Data-to-Text Generation
Laura Perez-Beltrachini | Rania Sayed | Claire Gardent
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In Natural Language Generation (NLG), one important limitation is the lack of common benchmarks on which to train, evaluate and compare data-to-text generators. In this paper, we make one step in that direction and introduce a method for automatically creating an arbitrary large repertoire of data units that could serve as input for generation. Using both automated metrics and a human evaluation, we show that the data units produced by our method are both diverse and coherent.

pdf bib
Learning Embeddings to lexicalise RDF Properties
Laura Perez-Beltrachini | Claire Gardent
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

2014

pdf bib
Incremental Query Generation
Laura Perez-Beltrachini | Claire Gardent | Enrico Franconi
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Weakly and Strongly Constrained Dialogues for Language Learning
Claire Gardent | Alejandra Lorenzo | Laura Perez-Beltrachini | Lina Rojas-Barahona
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Generating Grammar Exercises
Laura Perez-Beltrachini | Claire Gardent | German Kruszewski
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Using FB-LTAG Derivation Trees to Generate Transformation-Based Grammar Exercises
Claire Gardent | Laura Perez-Beltrachini
Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+11)

pdf bib
Representation of linguistic and domain knowledge for second language learning in virtual worlds
Alexandre Denis | Ingrid Falk | Claire Gardent | Laura Perez-Beltrachini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

There has been much debate, both theoretical and practical, on how to link ontologies and lexicons in natural language processing (NLP) applications. In this paper, we focus on an application in which lexicon and ontology are used to generate teaching material. We briefly describe the application (a serious game for language learning). We then zoom in on the representation and interlinking of the lexicon and of the ontology. We show how the use of existing standards and of good practice principles facilitates the design of our resources while satisfying the expressivity requirements set by natural language generation.

2010

pdf bib
RTG based surface realisation for TAG
Claire Gardent | Laura Perez-Beltrachini
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Comparing the performance of two TAG-based surface realisers using controlled grammar traversal
Claire Gardent | Benjamin Gottesman | Laura Perez-Beltrachini
Coling 2010: Posters