Zdeněk Kasner


Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models
Zdeněk Kasner | Ioannis Konstas | Ondrej Dusek
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Pretrained language models (PLMs) for data-to-text (D2T) generation can use human-readable data labels such as column headings, keys, or relation names to generalize to out-of-domain examples. However, the models are well-known in producing semantically inaccurate outputs if these labels are ambiguous or incomplete, which is often the case in D2T datasets. In this paper, we expose this issue on the task of descibing a relation between two entities. For our experiments, we collect a novel dataset for verbalizing a diverse set of 1,522 unique relations from three large-scale knowledge graphs (Wikidata, DBPedia, YAGO). We find that although PLMs for D2T generation expectedly fail on unclear cases, models trained with a large variety of relation labels are surprisingly robust in verbalizing novel, unseen relations. We argue that using data with a diverse set of clear and meaningful labels is key to training D2T generation systems capable of generalizing to novel domains.


Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System
Rudali Huidrom | Ondřej Dušek | Zdeněk Kasner | Thiago Castro Ferreira | Anya Belz
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges

In this paper, we present the results of two re- production studies for the human evaluation originally reported by Dušek and Kasner (2020) in which the authors comparatively evaluated outputs produced by a semantic error detection system for data-to-text generation against ref- erence outputs. In the first reproduction, the original evaluators repeat the evaluation, in a test of the repeatability of the original evalua- tion. In the second study, two new evaluators carry out the evaluation task, in a test of the reproducibility of the original evaluation under otherwise identical conditions. We describe our approach to reproduction, and present and analyse results, finding different degrees of re- producibility depending on result type, data and labelling task. Our resources are available and open-sourced.

Neural Pipeline for Zero-Shot Data-to-Text Generation
Zdeněk Kasner | Ondrej Dusek
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise. We examine how to avoid finetuning pretrained language models (PLMs) on D2T generation datasets while still taking advantage of surface realization capabilities of PLMs. Inspired by pipeline approaches, we propose to generate text by transforming single-item descriptions with a sequence of modules trained on general-domain text-based operations: ordering, aggregation, and paragraph compression. We train PLMs for performing these operations on a synthetic corpus WikiFluent which we build from English Wikipedia. Our experiments on two major triple-to-text datasets—WebNLG and E2E—show that our approach enables D2T generation from RDF triples in zero-shot settings.


Text-in-Context: Token-Level Error Detection for Table-to-Text Generation
Zdeněk Kasner | Simon Mille | Ondřej Dušek
Proceedings of the 14th International Conference on Natural Language Generation

We present our Charles-UPF submission for the Shared Task on Evaluating Accuracy in Generated Texts at INLG 2021. Our system can detect the errors automatically using a combination of a rule-based natural language generation (NLG) system and pretrained language models (LMs). We first utilize a rule-based NLG system to generate sentences with facts that can be derived from the input. For each sentence we evaluate, we select a subset of facts which are relevant by measuring semantic similarity to the sentence in question. Finally, we finetune a pretrained language model on annotated data along with the relevant facts for fine-grained error detection. On the test set, we achieve 69% recall and 75% precision with a model trained on a mixture of human-annotated and synthetic data.


Data-to-Text Generation with Iterative Text Editing
Zdeněk Kasner | Ondřej Dušek
Proceedings of the 13th International Conference on Natural Language Generation

We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task. The output of the model is filtered by a simple heuristic and reranked with an off-the-shelf pre-trained language model. We evaluate our approach on two major data-to-text datasets (WebNLG, Cleaned E2E) and analyze its caveats and benefits. Furthermore, we show that our formulation of data-to-text generation opens up the possibility for zero-shot domain adaptation using a general-domain dataset for sentence fusion.

Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference
Ondřej Dušek | Zdeněk Kasner
Proceedings of the 13th International Conference on Natural Language Generation

A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i.e. checking if the output text contains all and only facts supported by the input data. We propose a new metric for evaluating the semantic accuracy of D2T generation based on a neural model pretrained for natural language inference (NLI). We use the NLI model to check textual entailment between the input data and the output text in both directions, allowing us to reveal omissions or hallucinations. Input data are converted to text for NLI using trivial templates. Our experiments on two recent D2T datasets show that our metric can achieve high accuracy in identifying erroneous system outputs.

Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task
Jindřich Libovický | Zdeněk Kasner | Jindřich Helcl | Ondřej Dušek
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present our submission to the Simultaneous Translation And Paraphrase for Language Education (STAPLE) challenge. We used a standard Transformer model for translation, with a crosslingual classifier predicting correct translations on the output n-best list. To increase the diversity of the outputs, we used additional data to train the translation model, and we trained a paraphrasing model based on the Levenshtein Transformer architecture to generate further synonymous translations. The paraphrasing results were again filtered using our classifier. While the use of additional data and our classifier filter were able to improve results, the paraphrasing model produced too many invalid outputs to further improve the output quality. Our model without the paraphrasing component finished in the middle of the field for the shared task, improving over the best baseline by a margin of 10-22 % weighted F1 absolute.

Train Hard, Finetune Easy: Multilingual Denoising for RDF-to-Text Generation
Zdeněk Kasner | Ondřej Dušek
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

We describe our system for the RDF-to-text generation task of the WebNLG Challenge 2020. We base our approach on the mBART model, which is pre-trained for multilingual denoising. This allows us to use a simple, identical, end-to-end setup for both English and Russian. Requiring minimal taskor languagespecific effort, our model placed in the first third of the leaderboard for English and first or second for Russian on automatic metrics, and it made it into the best or second-best system cluster on human evaluation.