Ondřej Plátek

Also published as: Ondrej Platek


2023

pdf
TabGenie: A Toolkit for Table-to-Text Generation
Zdeněk Kasner | Ekaterina Garanina | Ondrej Platek | Ondrej Dusek
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Heterogenity of data-to-text generation datasets limits the research on data-to-text generation systems. We present TabGenie – a toolkit which enables researchers to explore, preprocess, and analyze a variety of data-to-text generation datasets through the unified framework of table-to-text generation. In TabGenie, all inputs are represented as tables with associated metadata. The tables can be explored through a web interface, which also provides an interactive mode for debugging table-to-text generation, facilitates side-by-side comparison of generated system outputs, and allows easy exports for manual analysis. Furthermore, TabGenie is equipped with command line processing tools and Python bindings for unified dataset loading and processing. We release TabGenie as a PyPI package and provide its open-source code and a live demo at https://github.com/kasnerz/tabgenie.

2018

pdf
Using Adversarial Examples in Natural Language Processing
Petr Bělohlávek | Ondřej Plátek | Zdeněk Žabokrtský | Milan Straka
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2014

pdf
Alex: Bootstrapping a Spoken Dialogue System for a New Domain by Real Users
Ondřej Dušek | Ondřej Plátek | Lukáš Žilka | Filip Jurčíček
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf
Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices
Ondřej Plátek | Filip Jurčíček
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf
Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license
Matěj Korvas | Ondřej Plátek | Ondřej Dušek | Lukáš Žilka | Filip Jurčíček
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition (ASR) in spoken dialogue systems (SDSs). The data comprise 45 hours of speech in English and over 18 hours in Czech. Large part of the data, both audio and transcriptions, was collected using crowdsourcing, the rest are transcriptions by hired transcribers. We release the data together with scripts for data pre-processing and building acoustic models using the HTK and Kaldi ASR toolkits. We publish also the trained models described in this paper. The data are released under the CC-BY-SA 3.0 license, the scripts are licensed under Apache 2.0. In the paper, we report on the methodology of collecting the data, on the size and properties of the data, and on the scripts and their use. We verify the usability of the datasets by training and evaluating acoustic models using the presented data and scripts.