Krzysztof Jassem


2022

pdf
Challenging America: Modeling language in longer time scales
Jakub Pokrywka | Filip Graliński | Krzysztof Jassem | Karol Kaczmarek | Krzysztof Jurkiewicz | Piotr Wierzchon
Findings of the Association for Computational Linguistics: NAACL 2022

The aim of the paper is to apply, for historical texts, the methodology used commonly to solve various NLP tasks defined for contemporary data, i.e. pre-train and fine-tune large Transformer models. This paper introduces an ML challenge, named Challenging America (ChallAm), based on OCR-ed excerpts from historical newspapers collected from the Chronicling America portal. ChallAm provides a dataset of clippings, labeled with metadata on their origin, and paired with their textual contents retrieved by an OCR tool. Three, publicly available, ML tasks are defined in the challenge: to determine the article date, to detect the location of the issue, and to deduce a word in a text gap (cloze test). Strong baselines are provided for all three ChallAm tasks. In particular, we pre-trained a RoBERTa model from scratch from the historical texts. We also discuss the issues of discrimination and hate-speech present in the historical American texts.

pdf
nEYron: Implementation and Deployment of an MT System for a Large Audit & Consulting Corporation
Artur Nowakowski | Krzysztof Jassem | Maciej Lison | Rafał Jaworski | Tomasz Dwojak | Karolina Wiater | Olga Posesor
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This paper reports on the implementation and deployment of an MT system in the Polish branch of EY Global Limited. The system supports standard CAT and MT functionalities such as translation memory fuzzy search, document translation and post-editing, and meets less common, customer-specific expectations. The deployment began in August 2018 with a Proof of Concept, and ended with the signing of the Final Version acceptance certificate in October 2021. We present the challenges that were faced during the deployment, particularly in relation to the security check and installation processes in the production environment.

pdf
POLENG MT: An Adaptive MT Platform
Artur Nowakowski | Krzysztof Jassem | Maciej Lison | Kamil Guttmann | Mikołaj Pokrywka
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

We introduce POLENG MT, an MT platform that may be used as a cloud web application or as an on-site solution. The platform is capable of providing accurate document translation, including the transfer of document formatting between the input document and the output document. The main feature of the on-site version is dedicated customer adaptation, which consists of training on specialized texts and applying forced terminology translation according to the user’s needs.

2021

pdf
Neural Machine Translation with Inflected Lexicon
Artur Nowakowski | Krzysztof Jassem
Proceedings of Machine Translation Summit XVIII: Research Track

The paper presents experiments in neural machine translation with lexical constraints into a morphologically rich language. In particular and we introduce a method and based on constrained decoding and which handles the inflected forms of lexical entries and does not require any modification to the training data or model architecture. To evaluate its effectiveness and we carry out experiments in two different scenarios: general and domain-specific. We compare our method with baseline translation and i.e. translation without lexical constraints and in terms of translation speed and translation quality. To evaluate how well the method handles the constraints and we propose new evaluation metrics which take into account the presence and placement and duplication and inflectional correctness of lexical terms in the output sentence.

pdf
Neural Translator Designed to Protect the Eastern Border of the European Union
Artur Nowakowski | Krzysztof Jassem
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

This paper reports on a translation engine designed for the needs of the Polish State Border Guard. The engine is a component of the AI Searcher system, whose aim is to search for Internet texts, written in Polish, Russian, Ukrainian or Belarusian, which may lead to criminal acts at the eastern border of the European Union. The system is intended for Polish users, and the translation engine should serve to assist understanding of non-Polish documents. The engine was trained on general-domain texts. The adaptation for the criminal domain consisted in the appropriate translation of criminal terms and proper names, such as forenames, surnames and geographical objects. The translation process needs to take into account the rich inflection found in all of the languages of interest. To this end, a method based on constrained decoding that incorporates an inflected lexicon into a neural translation process was applied in the engine.

2009

pdf
An Environment for Named Entity Recognition and Translation
Filip Graliński | Krzysztof Jassem | Michał Marcińczuk
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2004

pdf
Applying Oxford-PWN English-Polish dictionary to machine translation
Krzysztof Jassem
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

2000

pdf
POLENG–Adjusting a Rule-Based Polish–English Machine Translation System by Means of Corpus Analysis
Krzysztof Jassem | Filip Graliński | Grzegorz Krynicki
5th EAMT Workshop: Harvesting Existing Resources

1997

pdf
A Polish-to-English Text-to-text Translation System Based on an Electronic Dictionary
Krzysztof Jassem
Spoken Language Translation