Tymoteusz Krumholc


2020

pdf
An Empirical Study on Multi-Task Learning for Text Style Transfer and Paraphrase Generation
Pawel Bujnowski | Kseniia Ryzhova | Hyungtak Choi | Katarzyna Witkowska | Jaroslaw Piersa | Tymoteusz Krumholc | Katarzyna Beksa
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

The topic of this paper is neural multi-task training for text style transfer. We present an efficient method for neutral-to-style transformation using the transformer framework. We demonstrate how to prepare a robust model utilizing large paraphrases corpora together with a small parallel style transfer corpus. We study how much style transfer data is needed for a model on the example of two transformations: neutral-to-cute on internal corpus and modern-to-antique on publicly available Bible corpora. Additionally, we propose a synthetic measure for the automatic evaluation of style transfer models. We hope our research is a step towards replacing common but limited rule-based style transfer systems by more flexible machine learning models for both public and commercial usage.

2019

pdf
NLPR@SRPOL at SemEval-2019 Task 6 and Task 5: Linguistically enhanced deep learning offensive sentence classifier
Alessandro Seganti | Helena Sobol | Iryna Orlova | Hannam Kim | Jakub Staniszewski | Tymoteusz Krumholc | Krystian Koziel
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper presents a system developed for the SemEval-2019 competition Task 5 hat- Eval Basile et al. (2019) (team name: LU Team) and Task 6 OffensEval Zampieri et al. (2019b) (team name: NLPR@SRPOL), where we achieved 2nd position in Subtask C. The system combines in an ensemble several models (LSTM, Transformer, OpenAI’s GPT, Random forest, SVM) with various embeddings (custom, ELMo, fastText, Universal Encoder) together with additional linguistic features (number of blacklisted words, special characters, etc.). The system works with a multi-tier blacklist and a large corpus of crawled data, annotated for general offensiveness. In the paper we do an extensive analysis of our results and show how the combination of features and embedding affect the performance of the models.