Tomáš Musil


2022

pdf
GPT-2-based Human-in-the-loop Theatre Play Script Generation
Rudolf Rosa | Patrícia Schmidtová | Ondřej Dušek | Tomáš Musil | David Mareček | Saad Obaid | Marie Nováková | Klára Vosecká | Josef Doležal
Proceedings of the 4th Workshop of Narrative Understanding (WNU2022)

We experiment with adapting generative language models for the generation of long coherent narratives in the form of theatre plays. Since fully automatic generation of whole plays is not currently feasible, we created an interactive tool that allows a human user to steer the generation somewhat while minimizing intervention. We pursue two approaches to long-text generation: a flat generation with summarization of context, and a hierarchical text-to-text two-stage approach, where a synopsis is generated first and then used to condition generation of the final script. Our preliminary results and discussions with theatre professionals show improvements over vanilla language model generation, but also identify important limitations of our approach.

pdf
THEaiTRobot: An Interactive Tool for Generating Theatre Play Scripts
Rudolf Rosa | Patrícia Schmidtová | Alisa Zakhtarenko | Ondrej Dusek | Tomáš Musil | David Mareček | Saad Ul Islam | Marie Novakova | Klara Vosecka | Daniel Hrbek | David Kostak
Proceedings of the 15th International Conference on Natural Language Generation: System Demonstrations

We present a free online demo of THEaiTRobot, an open-source bilingual tool for interactively generating theatre play scripts, in two versions. THEaiTRobot 1.0 uses the GPT-2 language model with minimal adjustments. THEaiTRobot 2.0 uses two models created by fine-tuning GPT-2 on purposefully collected and processed datasets and several other components, generating play scripts in a hierarchical fashion (title synopsis script). The underlying tool is used in the THEaiTRE project to generate scripts for plays, which are then performed on stage by a professional theatre.

2021

pdf
Representations of Meaning in Neural Networks for NLP: a Thesis Proposal
Tomáš Musil
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Neural networks are the state-of-the-art method of machine learning for many problems in NLP. Their success in machine translation and other NLP tasks is phenomenal, but their interpretability is challenging. We want to find out how neural networks represent meaning. In order to do this, we propose to examine the distribution of meaning in the vector space representation of words in neural networks trained for NLP tasks. Furthermore, we propose to consider various theories of meaning in the philosophy of language and to find a methodology that would enable us to connect these areas.

2019

pdf
Derivational Morphological Relations in Word Embeddings
Tomáš Musil | Jonáš Vidra | David Mareček
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Derivation is a type of a word-formation process which creates new words from existing ones by adding, changing or deleting affixes. In this paper, we explore the potential of word embeddings to identify properties of word derivations in the morphologically rich Czech language. We extract derivational relations between pairs of words from DeriNet, a Czech lexical network, which organizes almost one million Czech lemmas into derivational trees. For each such pair, we compute the difference of the embeddings of the two words, and perform unsupervised clustering of the resulting vectors. Our results show that these clusters largely match manually annotated semantic categories of the derivational relations (e.g. the relation ‘bake–baker’ belongs to category ‘actor’, and a correct clustering puts it into the same cluster as ‘govern–governor’).

pdf
A Test Suite and Manual Evaluation of Document-Level NMT at WMT19
Kateřina Rysová | Magdaléna Rysová | Tomáš Musil | Lucie Poláková | Ondřej Bojar
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

As the quality of machine translation rises and neural machine translation (NMT) is moving from sentence to document level translations, it is becoming increasingly difficult to evaluate the output of translation systems. We provide a test suite for WMT19 aimed at assessing discourse phenomena of MT systems participating in the News Translation Task. We have manually checked the outputs and identified types of translation errors that are relevant to document-level translation.

2018

pdf
Neural Monkey: The Current State and Beyond
Jindřich Helcl | Jindřich Libovický | Tom Kocmi | Tomáš Musil | Ondřej Cífka | Dušan Variš | Ondřej Bojar
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

2017

pdf
Results of the WMT17 Neural MT Training Task
Ondřej Bojar | Jindřich Helcl | Tom Kocmi | Jindřich Libovický | Tomáš Musil
Proceedings of the Second Conference on Machine Translation