Claire Gardent

2021

pdf bib abs
Augmenting Transformers with KNN-Based Composite Memory for Dialog
Angela Fan | Claire Gardent | Chloé Braud | Antoine Bordes
Transactions of the Association for Computational Linguistics, Volume 9

Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing this knowledge. We propose augmenting generative Transformer neural networks with KNN-based Information Fetching (KIF) modules. Each KIF module learns a read operation to access fixed external knowledge. We apply these modules to generative dialog modeling, a challenging task where information must be flexibly retrieved and incorporated to maintain the topic and flow of conversation. We demonstrate the effectiveness of our approach by identifying relevant knowledge required for knowledgeable but engaging dialog from Wikipedia, images, and human-written dialog utterances, and show that leveraging this retrieved information improves model performance, measured by automatic and human evaluation.

pdf bib abs
An Error Analysis Framework for Shallow Surface Realization
Anastasia Shimorina | Yannick Parmentier | Claire Gardent
Transactions of the Association for Computational Linguistics, Volume 9

Abstract The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or METEOR, fail to provide information on which linguistic factors impact performance. Focusing on Surface Realization (SR), the task of converting an unordered dependency tree into a well-formed sentence, we propose a framework for error analysis which permits identifying which features of the input affect the models’ results. This framework consists of two main components: (i) correlation analyses between a wide range of syntactic metrics and standard performance metrics and (ii) a set of techniques to automatically identify syntactic constructs that often co-occur with low performance scores. We demonstrate the advantages of our framework by performing error analysis on the results of 174 system runs submitted to the Multilingual SR shared tasks; we show that dependency edge accuracy correlate with automatic metrics thereby providing a more interpretable basis for evaluation; and we suggest ways in which our framework could be used to improve models and data. The framework is available in the form of a toolkit which can be used both by campaign organizers to provide detailed, linguistically interpretable feedback on the state of the art in multilingual SR, and by individual researchers to improve models and datasets.1

pdf bib abs
Gathering Information and Engaging the User ComBot: A Task-Based, Serendipitous Dialog Model for Patient-Doctor Interactions
Anna Liednikova | Philippe Jolivet | Alexandre Durand-Salmon | Claire Gardent
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations

We focus on dialog models in the context of clinical studies where the goal is to help gather, in addition to the close information collected based on a questionnaire, serendipitous information that is medically relevant. To promote user engagement and address this dual goal (collecting both a predefined set of data points and more informal information about the state of the patients), we introduce an ensemble model made of three bots: a task-based, a follow-up and a social bot. We introduce a generic method for developing follow-up bots. We compare different ensemble configurations and we show that the combination of the three bots (i) provides a better basis for collecting information than just the information seeking bot and (ii) collects information in a more user-friendly, more efficient manner that an ensemble model combining the information seeking and the social bot.

pdf bib abs
Discourse-Based Sentence Splitting
Liam Cripwell | Joël Legrand | Claire Gardent
Findings of the Association for Computational Linguistics: EMNLP 2021

Sentence splitting involves the segmentation of a sentence into two or more shorter sentences. It is a key component of sentence simplification, has been shown to help human comprehension and is a useful preprocessing step for NLP tasks such as summarisation and relation extraction. While several methods and datasets have been proposed for developing sentence splitting models, little attention has been paid to how sentence splitting interacts with discourse structure. In this work, we focus on cases where the input text contains a discourse connective, which we refer to as discourse-based sentence splitting. We create synthetic and organic datasets for discourse-based splitting and explore different ways of combining these datasets using different model architectures. We show that pipeline models which use discourse structure to mediate sentence splitting outperform end-to-end models in learning the various ways of expressing a discourse relation but generate text that is less grammatical; that large scale synthetic data provides a better basis for learning than smaller scale organic data; and that training on discourse-focused, rather than on general sentence splitting data provides a better basis for discourse splitting.

pdf bib abs
Entity-Based Semantic Adequacy for Data-to-Text Generation
Juliette Faille | Albert Gatt | Claire Gardent
Findings of the Association for Computational Linguistics: EMNLP 2021

While powerful pre-trained language models have improved the fluency of text generation models, semantic adequacy -the ability to generate text that is semantically faithful to the input- remains an unsolved issue. In this paper, we introduce a novel automatic evaluation metric, Entity-Based Semantic Adequacy, which can be used to assess to what extent generation models that verbalise RDF (Resource Description Framework) graphs produce text that contains mentions of the entities occurring in the RDF input. This is important as RDF subject and object entities make up 2/3 of the input. We use our metric to compare 25 models from the WebNLG Shared Tasks and we examine correlation with results from human evaluations of semantic adequacy. We show that while our metric correlates with human evaluation scores, this correlation varies with the specifics of the human evaluation setup. This suggests that in order to measure the entity-based adequacy of generated texts, an automatic metric such as the one proposed here might be more reliable, as less subjective and more focused on correct verbalisation of the input, than human evaluation measures.

2020

pdf bib abs
Multilingual AMR-to-Text Generation
Angela Fan | Claire Gardent
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Generating text from structured data is challenging because it requires bridging the gap between (i) structure and natural language (NL) and (ii) semantically underspecified input and fully specified NL output. Multilingual generation brings in an additional challenge: that of generating into languages with varied word order and morphological properties. In this work, we focus on Abstract Meaning Representations (AMRs) as structured input, where previous research has overwhelmingly focused on generating only into English. We leverage advances in cross-lingual embeddings, pretraining, and multilingual models to create multilingual AMR-to-text models that generate in twenty one different languages. Our multilingual models surpass baselines that generate into one language in eighteen languages, based on automatic metrics. We analyze the ability of our multilingual models to accurately capture morphology and word order using human evaluation, and find that native speakers judge our generations to be fluent.

pdf bib
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
Thiago Castro Ferreira | Claire Gardent | Nikolai Ilinykh | Chris van der Lee | Simon Mille | Diego Moussallem | Anastasia Shimorina
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

The RDF-to-text task has recently gained substantial attention due to the continuous growth of RDF knowledge graphs in number and size. Recent studies have focused on systematically comparing RDF-to-text approaches on benchmarking datasets such as WebNLG. Although some evaluation tools have already been proposed for text generation, none of the existing solutions abides by the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles and involves RDF data for the knowledge extraction task. In this paper, we present BENG, a FAIR benchmarking platform for Natural Language Generation (NLG) and Knowledge Extraction systems with focus on RDF data. BENG builds upon the successful benchmarking platform GERBIL, is opensource and is publicly available along with the data it contains.

pdf bib abs
The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020)
Thiago Castro Ferreira | Claire Gardent | Nikolai Ilinykh | Chris van der Lee | Simon Mille | Diego Moussallem | Anastasia Shimorina
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

WebNLG+ offers two challenges: (i) mapping sets of RDF triples to English or Russian text (generation) and (ii) converting English or Russian text to sets of RDF triples (semantic parsing). Compared to the eponymous WebNLG challenge, WebNLG+ provides an extended dataset that enable the training, evaluation, and comparison of microplanners and semantic parsers. In this paper, we present the results of the generation and semantic parsing task for both English and Russian and provide a brief description of the participating systems.

pdf bib abs
The Natural Language Pipeline, Neural Text Generation and Explainability
Juliette Faille | Albert Gatt | Claire Gardent
2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence

End-to-end encoder-decoder approaches to data-to-text generation are often black boxes whose predictions are difficult to explain. Breaking up the end-to-end model into sub-modules is a natural way to address this problem. The traditional pre-neural Natural Language Generation (NLG) pipeline provides a framework for breaking up the end-to-end encoder-decoder. We survey recent papers that integrate traditional NLG submodules in neural approaches and analyse their explainability. Our survey is a first step towards building explainable neural NLG models.

pdf bib abs
Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs
Leonardo F. R. Ribeiro | Yue Zhang | Claire Gardent | Iryna Gurevych
Transactions of the Association for Computational Linguistics, Volume 8

Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representations. Global node encoding allows explicit communication between two distant nodes, thereby neglecting graph topology as all nodes are directly connected. In contrast, local node encoding considers the relations between neighbor nodes capturing the graph structure, but it can fail to capture long-range relations. In this work, we gather both encoding strategies, proposing novel neural models that encode an input graph combining both global and local node contexts, in order to learn better contextualized node embeddings. In our experiments, we demonstrate that our approaches lead to significant improvements on two graph-to-text datasets achieving BLEU scores of 18.01 on the AGENDA dataset, and 63.69 on the WebNLG dataset for seen categories, outperforming state-of-the-art models by 3.7 and 3.1 points, respectively.1

pdf bib abs
Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge
Anna Liednikova | Philippe Jolivet | Alexandre Durand-Salmon | Claire Gardent
Proceedings of the 28th International Conference on Computational Linguistics

A key bottleneck for developing dialog models is the lack of adequate training data. Due to privacy issues, dialog data is even scarcer in the health domain. We propose a novel method for creating dialog corpora which we apply to create doctor-patient interaction data. We use this data to learn both a generation and a hybrid classification/retrieval model and find that the generation model consistently outperforms the hybrid model. We show that our data creation method has several advantages. Not only does it allow for the semi-automatic creation of large quantities of training data. It also provides a natural way of guiding learning and a novel method for assessing the quality of human-machine interactions.

2019

pdf bib abs
Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing
Anastasia Shimorina | Elena Khasanova | Claire Gardent
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

In this paper, we propose an approach for semi-automatically creating a data-to-text (D2T) corpus for Russian that can be used to learn a D2T natural language generation model. An error analysis of the output of an English-to-Russian neural machine translation system shows that 80% of the automatically translated sentences contain an error and that 53% of all translation errors bear on named entities (NE). We therefore focus on named entities and introduce two post-editing techniques for correcting wrongly translated NEs.

pdf bib abs
Generating Text from Anonymised Structures
Emilie Colin | Claire Gardent
Proceedings of the 12th International Conference on Natural Language Generation

Surface realisation (SR) consists in generating a text from a meaning representations (MR). In this paper, we introduce a new parallel dataset of deep meaning representations (MR) and French sentences and we present a novel method for MR-to-text generation which seeks to generalise by abstracting away from lexical content. Most current work on natural language generation focuses on generating text that matches a reference using BLEU as evaluation criteria. In this paper, we additionally consider the model’s ability to reintroduce the function words that are absent from the deep input meaning representations. We show that our approach increases both BLEU score and the scores used to assess function words generation.

pdf bib abs
Revisiting the Binary Linearization Technique for Surface Realization
Yevgeniy Puzikov | Claire Gardent | Ido Dagan | Iryna Gurevych
Proceedings of the 12th International Conference on Natural Language Generation

End-to-end neural approaches have achieved state-of-the-art performance in many natural language processing (NLP) tasks. Yet, they often lack transparency of the underlying decision-making process, hindering error analysis and certain model improvements. In this work, we revisit the binary linearization approach to surface realization, which exhibits more interpretable behavior, but was falling short in terms of prediction accuracy. We show how enriching the training data to better capture word order constraints almost doubles the performance of the system. We further demonstrate that encoding both local and global prediction contexts yields another considerable performance boost. With the proposed modifications, the system which ranked low in the latest shared task on multilingual surface realization now achieves best results in five out of ten languages, while being on par with the state-of-the-art approaches in others.

pdf bib abs
Surface Realisation Using Full Delexicalisation
Anastasia Shimorina | Claire Gardent
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Surface realisation (SR) maps a meaning representation to a sentence and can be viewed as consisting of three subtasks: word ordering, morphological inflection and contraction generation (e.g., clitic attachment in Portuguese or elision in French). We propose a modular approach to surface realisation which models each of these components separately, and evaluate our approach on the 10 languages covered by the SR’18 Surface Realisation Shared Task shallow track. We provide a detailed evaluation of how word order, morphological realisation and contractions are handled by the model and an analysis of the differences in word ordering performance across languages.

pdf bib abs
Enhancing AMR-to-Text Generation with Dual Graph Representations
Leonardo F. R. Ribeiro | Claire Gardent | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating text from graph-based data, such as Abstract Meaning Representation (AMR), is a challenging task due to the inherent difficulty in how to properly encode the structure of a graph with labeled edges. To address this difficulty, we propose a novel graph-to-sequence model that encodes different but complementary perspectives of the structural information contained in the AMR graph. The model learns parallel top-down and bottom-up representations of nodes capturing contrasting views of the graph. We also investigate the use of different node message passing strategies, employing different state-of-the-art graph encoders to compute node representations based on incoming and outgoing perspectives. In our experiments, we demonstrate that the dual graph representation leads to improvements in AMR-to-text generation, achieving state-of-the-art results on two AMR datasets

pdf bib abs
Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs
Angela Fan | Claire Gardent | Chloé Braud | Antoine Bordes
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Query-based open-domain NLP tasks require information synthesis from long and diverse web results. Current approaches extractively select portions of web text as input to Sequence-to-Sequence models using methods such as TF-IDF ranking. We propose constructing a local graph structured knowledge base for each query, which compresses the web search information and reduces redundancy. We show that by linearizing the graph into a structured input sequence, models can encode the graph representations within a standard Sequence-to-Sequence setting. For two generative tasks with very long text input, long-form question answering and multi-document summarization, feeding graph representations as input can achieve better performance than using retrieved text portions.

pdf bib abs
LORIA / Lorraine University at Multilingual Surface Realisation 2019
Anastasia Shimorina | Claire Gardent
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

This paper presents the LORIA / Lorraine University submission at the Multilingual Surface Realisation shared task 2019 for the shallow track. We outline our approach and evaluate it on 11 languages covered by the shared task. We provide a separate evaluation of each component of our pipeline, concluding on some difficulties and suggesting directions for future work.

2018

pdf bib abs
Generating Syntactic Paraphrases
Emilie Colin | Claire Gardent
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We study the automatic generation of syntactic paraphrases using four different models for generation: data-to-text generation, text-to-text generation, text reduction and text expansion, We derive training data for each of these tasks from the WebNLG dataset and we show (i) that conditioning generation on syntactic constraints effectively permits the generation of syntactically distinct paraphrases for the same input and (ii) that exploiting different types of input (data, text or data+text) further increases the number of distinct paraphrases that can be generated for a given input.

pdf bib abs
Deep Learning Approaches to Text Production
Claire Gardent | Shashi Narayan
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Text production is a key component of many NLP applications. In data-driven approaches, it is used for instance, to generate dialogue turns from dialogue moves, to verbalise the content of Knowledge bases or to generate natural English sentences from rich linguistic representations, such as dependency trees or Abstract Meaning Representations. In text-driven methods on the other hand, text production is at work in sentence compression, sentence fusion, paraphrasing, sentence (or text) simplification, text summarisation and end-to-end dialogue systems. Following the success of encoder-decoder models in modeling sequence-rewriting tasks such as machine translation, deep learning models have successfully been applied to the various text production tasks. In this tutorial, we will cover the fundamentals and the state-of-the-art research on neural models for text production. Each text production task raises a slightly different communication goal (e.g, how to take the dialogue context into account when producing a dialogue turn; how to detect and merge relevant information when summarising a text; or how to produce a well-formed text that correctly capture the information contained in some input data in the case of data-to-text generation). We will outline the constraints specific to each subtasks and examine how the existing neural models account for them.

pdf bib abs
Handling Rare Items in Data-to-Text Generation
Anastasia Shimorina | Claire Gardent
Proceedings of the 11th International Conference on Natural Language Generation

Neural approaches to data-to-text generation generally handle rare input items using either delexicalisation or a copy mechanism. We investigate the relative impact of these two methods on two datasets (E2E and WebNLG) and using two evaluation settings. We show (i) that rare items strongly impact performance; (ii) that combining delexicalisation and copying yields the strongest improvement; (iii) that copying underperforms for rare and unseen items and (iv) that the impact of these two mechanisms greatly varies depending on how the dataset is constructed and on how it is split into train, dev and test.

2017

pdf bib abs
A Statistical, Grammar-Based Approach to Microplanning
Claire Gardent | Laura Perez-Beltrachini
Computational Linguistics, Volume 43, Issue 1 - April 2017

Although there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine-grained interactions that arise during microplanning between aggregation, surface realization, and sentence segmentation. In this article, we propose a hybrid symbolic/statistical approach to jointly model the constraints regulating these interactions. Our approach integrates a small handwritten grammar, a statistical hypertagger, and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid statistic/symbolic system more fluent than both a template-based and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation, and surface realization.

pdf bib abs
Split and Rephrase
Shashi Narayan | Claire Gardent | Shay B. Cohen | Anastasia Shimorina
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. Like sentence simplification, splitting-and-rephrasing has the potential of benefiting both natural language processing and societal applications. Because shorter sentences are generally better processed by NLP systems, it could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems. It should also be of use for people with reading disabilities because it allows the conversion of longer sentences into shorter ones. This paper makes two contributions towards this new task. First, we create and make available a benchmark consisting of 1,066,115 tuples mapping a single complex sentence to a sequence of sentences expressing the same meaning. Second, we propose five models (vanilla sequence-to-sequence to semantically-motivated models) to understand the difficulty of the proposed task.

pdf bib abs
Creating Training Corpora for NLG Micro-Planners
Claire Gardent | Anastasia Shimorina | Shashi Narayan | Laura Perez-Beltrachini
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present a novel framework for semi-automatically creating linguistically challenging micro-planning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB verbalisers. We apply our framework to DBpedia data and compare the resulting dataset with Wen et al. 2016’s. We show that while Wen et al.’s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation. To encourage researchers to take up this challenge, we made available a dataset of 21,855 data/text pairs created using this framework in the context of the WebNLG shared task.

pdf bib abs
The WebNLG Challenge: Generating Text from RDF Data
Claire Gardent | Anastasia Shimorina | Shashi Narayan | Laura Perez-Beltrachini
Proceedings of the 10th International Conference on Natural Language Generation

The WebNLG challenge consists in mapping sets of RDF triples to text. It provides a common benchmark on which to train, evaluate and compare “microplanners”, i.e. generation systems that verbalise a given content by making a range of complex interacting choices including referring expression generation, aggregation, lexicalisation, surface realisation and sentence segmentation. In this paper, we introduce the microplanning task, describe data preparation, introduce our evaluation methodology, analyse participant results and provide a brief description of the participating systems.

pdf bib abs
Analysing Data-To-Text Generation Benchmarks
Laura Perez-Beltrachini | Claire Gardent
Proceedings of the 10th International Conference on Natural Language Generation

A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.

pdf bib
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers
Claire Gardent | Christian Retoré
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

pdf bib
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers
Claire Gardent | Christian Retoré
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

2016

pdf bib
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)
Claire Gardent | Aldo Gangemi
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

pdf bib
Content Selection through Paraphrase Detection: Capturing different Semantic Realisations of the Same Idea
Elena Lloret | Claire Gardent
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

pdf bib
Aligning Texts and Knowledge Bases with Semantic Sentence Simplification
Yassine Mrabet | Pavlos Vougiouklis | Halil Kilicoglu | Claire Gardent | Dina Demner-Fushman | Jonathon Hare | Elena Simperl
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

pdf bib
Content selection as semantic-based ontology exploration
Laura Perez-Beltrachini | Claire Gardent | Anselme Revuz | Saptarashmi Bandyopadhyay
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

pdf bib
Generating Paraphrases from DBPedia using Deep Learning
Amin Sleimi | Claire Gardent
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

pdf bib
Category-Driven Content Selection
Rania Mohammed | Laura Perez-Beltrachini | Claire Gardent
Proceedings of the 9th International Natural Language Generation conference

pdf bib
Unsupervised Sentence Simplification Using Deep Semantics
Shashi Narayan | Claire Gardent
Proceedings of the 9th International Natural Language Generation conference

pdf bib
The WebNLG Challenge: Generating Text from DBPedia Data
Emilie Colin | Claire Gardent | Yassine M’rabet | Shashi Narayan | Laura Perez-Beltrachini
Proceedings of the 9th International Natural Language Generation conference

pdf bib abs
Building RDF Content for Data-to-Text Generation
Laura Perez-Beltrachini | Rania Sayed | Claire Gardent
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In Natural Language Generation (NLG), one important limitation is the lack of common benchmarks on which to train, evaluate and compare data-to-text generators. In this paper, we make one step in that direction and introduce a method for automatically creating an arbitrary large repertoire of data units that could serve as input for generation. Using both automated metrics and a human evaluation, we show that the data units produced by our method are both diverse and coherent.

pdf bib
Sequence-based Structured Prediction for Semantic Parsing
Chunyang Xiao | Marc Dymetman | Claire Gardent
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics
Claire Gardent | Raffaella Bernardi | Ivan Titov
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib
Orthogonality regularizer for question answering
Chunyang Xiao | Guillaume Bouchard | Marc Dymetman | Claire Gardent
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib
Learning Embeddings to lexicalise RDF Properties
Laura Perez-Beltrachini | Claire Gardent
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

We describe the acquisition of a dialog corpus for French based on multi-task human-machine interactions in a serious game setting. We present a tool for data collection that is configurable for multiple games; describe the data collected using this tool and the annotation schema used to annotate it; and report on the results obtained when training a classifier on the annotated data to associate each player turn with a dialog move usable by a rule based dialog manager. The collected data consists of approximately 1250 dialogs, 10454 utterances and 168509 words and will be made freely available to academic and nonprofit research.

pdf bib abs
Representation of linguistic and domain knowledge for second language learning in virtual worlds
Alexandre Denis | Ingrid Falk | Claire Gardent | Laura Perez-Beltrachini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

There has been much debate, both theoretical and practical, on how to link ontologies and lexicons in natural language processing (NLP) applications. In this paper, we focus on an application in which lexicon and ontology are used to generate teaching material. We briefly describe the application (a serious game for language learning). We then zoom in on the representation and interlinking of the lexicon and of the ontology. We show how the use of existing standards and of good practice principles facilitates the design of our resources while satisfying the expressivity requirements set by natural language generation.

2011

pdf bib
Génération de phrase : entrée, algorithmes et applications (Sentence Generation: Input, Algorithms and Applications)
Claire Gardent
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées

pdf bib abs
Vers la détection des dislocations à gauche dans les transcriptions automatiques du Français parlé (Towards automatic recognition of left dislocation in transcriptions of Spoken French)
Corinna Anderson | Christophe Cerisara | Claire Gardent
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Ce travail prend place dans le cadre plus général du développement d’une plate-forme d’analyse syntaxique du français parlé. Nous décrivons la conception d’un modèle automatique pour résoudre le lien anaphorique présent dans les dislocations à gauche dans un corpus de français parlé radiophonique. La détection de ces structures devrait permettre à terme d’améliorer notre analyseur syntaxique en enrichissant les informations prises en compte dans nos modèles automatiques. La résolution du lien anaphorique est réalisée en deux étapes : un premier niveau à base de règles filtre les configurations candidates, et un second niveau s’appuie sur un modèle appris selon le critère du maximum d’entropie. Une évaluation expérimentale réalisée par validation croisée sur un corpus annoté manuellement donne une F-mesure de l’ordre de 40%.

pdf bib
Proceedings of the 13th European Workshop on Natural Language Generation
Claire Gardent | Kristina Striegnitz
Proceedings of the 13th European Workshop on Natural Language Generation

2010

pdf bib abs
Identifying Sources of Weakness in Syntactic Lexicon Extraction
Claire Gardent | Alejandra Lorenzo
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Previous work has shown that large scale subcategorisation lexicons could be extracted from parsed corpora with reasonably high precision. In this paper, we apply a standard extraction procedure to a 100 millions words parsed corpus of french and obtain rather poor results. We investigate different factors likely to improve performance such as in particular, the specific extraction procedure and the parser used; the size of the input corpus; and the type of frames learned. We try out different ways of interleaving the output of several parsers with the lexicon extraction process and show that none of them improves the results. Conversely, we show that increasing the size of the input corpus and modifying the extraction procedure to better differentiate prepositional arguments from prepositional modifiers improves performance. In conclusion, we suggest that a more sophisticated approach to parser combination and better probabilistic models of the various types of prepositional objects in French are likely ways to get better results.

pdf bib abs
Syntactic Testsuites and Textual Entailment Recognition
Paul Bedaride | Claire Gardent
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We focus on textual entailments mediated by syntax and propose a new methodology to evaluate textual entailment recognition systems on such data. The main idea is to generate a syntactically annotated corpus of pairs of (non-)entailments and to use error mining methodology from the parsing field to identify the most likely sources of errors. To generate the evaluation corpus we use a template based generation approach where sentences, semantic representations and syntactic annotations are all created at the same time. Furthermore, we adapt the error mining methodology initially proposed for parsing to the field of textual entailment. To illustrate the approach, we apply the proposed methodology to the Afazio RTE system (an hybrid system focusing on syntactic entailment) and show how it permits identifying the most likely sources of errors made by this system on a testsuite of 10 000 (non-)entailment pairs which is balanced in term of (non-)entailment and in term of syntactic annotations.

pdf bib
RTG based surface realisation for TAG
Claire Gardent | Laura Perez-Beltrachini
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Benchmarking for syntax-based sentential inference
Paul Bedaride | Claire Gardent
Coling 2010: Posters

pdf bib
Comparing the performance of two TAG-based surface realisers using controlled grammar traversal
Claire Gardent | Benjamin Gottesman | Laura Perez-Beltrachini
Coling 2010: Posters

2009

pdf bib
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)
Alex Lascarides | Claire Gardent | Joakim Nivre
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib abs
Sens, synonymes et définitions
Ingrid Falk | Claire Gardent | Évelyne Jacquey | Fabienne Venant
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article décrit une méthodologie visant la réalisation d’une ressource sémantique en français centrée sur la synonymie. De manière complémentaire aux travaux existants, la méthode proposée n’a pas seulement pour objectif d’établir des liens de synonymie entre lexèmes, mais également d’apparier les sens possibles d’un lexème avec les ensembles de synonymes appropriés. En pratique, les sens possibles des lexèmes proviennent des définitions du TLFi et les synonymes de cinq dictionnaires accessibles à l’ATILF. Pour évaluer la méthode d’appariement entre sens d’un lexème et ensemble de synonymes, une ressource de référence a été réalisée pour 27 verbes du français par quatre lexicographes qui ont spécifié manuellement l’association entre verbe, sens (définition TLFi) et ensemble de synonymes. Relativement à ce standard étalon, la méthode d’appariement affiche une F-mesure de 0.706 lorsque l’ensemble des paramètres est pris en compte, notamment la distinction pronominal / non-pronominal pour les verbes du français et de 0.602 sans cette distinction.

pdf bib
Semantic Normalisation : a Framework and an Experiment
Paul Bedaride | Claire Gardent
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Grouping Synonyms by Definitions
Ingrid Falk | Claire Gardent | Evelyne Jacquey | Fabienne Venant
Proceedings of the International Conference RANLP-2009

2008

pdf bib abs
A Test Suite for Inference Involving Adjectives
Marilisa Amoia | Claire Gardent
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Recently, most of the research in NLP has concentrated on the creation of applications coping with textual entailment. However, there still exist very few resources for the evaluation of such applications. We argue that the reason for this resides not only in the novelty of the research field but also and mainly in the difficulty of defining the linguistic phenomena which are responsible for inference. As the TSNLP project has shown test suites provide optimal diagnostic and evaluation tools for NLP applications, as contrary to text corpora they provide a deep insight in the linguistic phenomena allowing control over the data. Thus in this paper, we present a test suite specifically developed for studying inference problems shown by English adjectives. The construction of the test suite is based on the deep linguistic analysis and following classification of entailment patterns of adjectives and follows the TSNLP guidelines on linguistic databases providing a clear coverage, systematic annotation of inference tasks, large reusability and simple maintenance. With the design of this test suite we aim at creating a resource supporting the evaluation of computational systems handling natural language inference and in particular at providing a benchmark against which to evaluate and compare existing semantic analysers.

pdf bib
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)
Claire Gardent | Anoop Sarkar
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

pdf bib
Integrating a Unification-Based Semantics in a Large Scale Lexicalised Tree Adjoining Grammar for French
Claire Gardent
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib abs
Réécriture et Détection d’Implication Textuelle
Paul Bédaride | Claire Gardent
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous présentons un système de normalisation de la variation syntaxique qui permet de mieux reconnaître la relation d’implication textuelle entre deux phrases. Le système est évalué sur une suite de tests comportant 2 520 paires test et les résultats montrent un gain en précision par rapport à un système de base variant entre 29.8 et 78.5 points la complexité des cas considérés.

2007

pdf bib
A Symbolic Approach to Near-Deterministic Surface Realisation using Tree Adjoining Grammar
Claire Gardent | Eric Kow
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
SemTAG: a platform for specifying Tree Adjoining Grammars and performing TAG-based Semantic Construction
Claire Gardent | Yannick Parmentier
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib abs
SemTAG, une architecture pour le développement et l’utilisation de grammaires d’arbres adjoints à portée sémantique
Claire Gardent | Yannick Parmentier
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous présentons une architecture logicielle libre et ouverte pour le développement de grammaires d’arbres adjoints à portée sémantique. Cette architecture utilise un compilateur de métagrammaires afin de faciliter l’extension et la maintenance de la grammaire, et intègre un module de construction sémantique permettant de vérifier la couverture aussi bien syntaxique que sémantique de la grammaire. Ce module utilise un analyseur syntaxique tabulaire généré automatiquement à partir de la grammaire par le système DyALog. Nous présentons également les résultats de l’évaluation d’une grammaire du français développée au moyen de cette architecture.

pdf bib abs
Évaluer SYNLEX
Ingrid Falk | Gil Francopoulo | Claire Gardent
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

SYNLEX est un lexique syntaxique extrait semi-automatiquement des tables du LADL. Comme les autres lexiques syntaxiques du français disponibles et utilisables pour le TAL (LEFFF, DICOVALENCE), il est incomplet et n’a pas fait l’objet d’une évaluation permettant de déterminer son rappel et sa précision par rapport à un lexique de référence. Nous présentons une approche qui permet de combler au moins partiellement ces lacunes. L’approche s’appuie sur les méthodes mises au point en acquisition automatique de lexique. Un lexique syntaxique distinct de SYNLEX est acquis à partir d’un corpus de 82 millions de mots puis utilisé pour valider et compléter SYNLEX. Le rappel et la précision de cette version améliorée de SYNLEX sont ensuite calculés par rapport à un lexique de référence extrait de DICOVALENCE.

pdf bib abs
Une réalisateur de surface basé sur une grammaire réversible
Claire Gardent | Éric Kow
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

En génération, un réalisateur de surface a pour fonction de produire, à partir d’une représentation conceptuelle donnée, une phrase grammaticale. Les réalisateur existants soit utilisent une grammaire réversible et des méthodes statistiques pour déterminer parmi l’ensemble des sorties produites la plus plausible ; soit utilisent des grammaires spécialisées pour la génération et des méthodes symboliques pour déterminer la paraphrase la plus appropriée à un contexte de génération donné. Dans cet article, nous présentons GENI, un réalisateur de surface basé sur une grammaire d’arbres adjoints pour le français qui réconcilie les deux approches en combinant une grammaire réversible avec une sélection symbolique des paraphrases.

pdf bib
A first order semantic approach to adjectival inference
Marilisa Amoia | Claire Gardent
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

pdf bib
Spotting Overgeneration Suspects
Claire Gardent | Eric Kow
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

2006

pdf bib
Three Reasons to Adopt TAG-Based Surface Realisation
Claire Gardent | Eric Kow
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms

pdf bib
SemTAG, the LORIA toolbox for TAG-based Parsing and Generation
Eric Kow | Yannick Parmentier | Claire Gardent
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms

pdf bib
Adjective based inference
Marilisa Amoia | Claire Gardent
Proceedings of the Workshop KRAQ’06: Knowledge and Reasoning for Language Processing

pdf bib abs
Extraction d’information de sous-catégorisation à partir des tables du LADL
Claire Gardent | Bruno Guillaume | Guy Perrier | Ingrid Falk
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Les tables du LADL (Laboratoire d’Automatique Documentaire et Linguistique) contiennent des données électroniques extensives sur les propriétés morphosyntaxiques et syntaxiques des foncteurs syntaxiques du français (verbes, noms, adjectifs). Ces données, dont on sait qu’elles sont nécessaires pour le bon fonctionnement des systèmes de traitement automatique des langues, ne sont cependant que peu utilisées par les systèmes actuels. Dans cet article, nous identifions les raisons de cette lacune et nous proposons une méthode de conversion des tables vers un format mieux approprié au traitement automatique des langues.

pdf bib abs
Intégration d’une dimension sémantique dans les grammaires d’arbres adjoints
Claire Gardent
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous considérons un formalisme linguistique pour lequel l’intégration d’information sémantique dans une grammaire à large couverture n’a pas encore été réalisée à savoir, les grammaires d’arbres adjoints (Tree Adjoining Grammar ou TAG). Nous proposons une méthode permettant cette intégration et décrivons sa mise en oeuvre dans une grammaire noyau pour le français. Nous montrons en particulier que le formalisme de spécification utilisé, XMG, (Duchier et al., 2004) permet une factorisation importante des données sémantiques facilitant ainsi le développement, la maintenance et le déboggage de la grammaire.

pdf bib
Coreference Handling in XMG
Claire Gardent | Yannick Parmentier
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions