This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
PawełPrzybysz
Also published as:
Pawel Przybysz
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
SRPOL team submission to WMT2025 introduces innovative approach using A* (A-star) algorithm of decoding in EuroLLM which gives diverse set of translation hypotheses. Subsequent reranking by Comet-QE and NLLB chooses the best of the diversed hypotheses which gives significant improvement of translation quality. The A* algorithm can be applied to decoding in any LLMs or other translation models. The experiment shows that by using free, openly accessible MT models you can achieve translation quality of the best online translators and LLMs using just a PC under your desk.
This paper presents the system description of Samsung R&D Institute Poland participation in WMT 2022 for General MT solution for medium and low resource languages: Russian and Croatian. Our approach combines iterative noised/tagged back-translation and iterative distillation. We investigated different monolingual resources and compared their influence on final translations. We used available BERT-likemodels for text classification and for extracting domains of texts. Then we prepared an ensemble of NMT models adapted to multiple domains. Finally we attempted to predict ensemble weight vectors from the BERT-based domain classifications for individual sentences. Our final trained models reached quality comparable to best online translators using only limited constrained resources during training.
This paper describes the submission to the WAT 2021 Indic Language Multilingual Task by Samsung R&D Institute Poland. The task covered translation between 10 Indic Languages (Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil and Telugu) and English. We combined a variety of techniques: transliteration, filtering, backtranslation, domain adaptation, knowledge-distillation and finally ensembling of NMT models. We applied an effective approach to low-resource training that consist of pretraining on backtranslations and tuning on parallel corpora. We experimented with two different domain-adaptation techniques which significantly improved translation quality when applied to monolingual corpora. We researched and applied a novel approach for finding the best hyperparameters for ensembling a number of translation models. All techniques combined gave significant improvement - up to +8 BLEU over baseline results. The quality of the models has been confirmed by the human evaluation where SRPOL models scored best for all 5 manually evaluated languages.
We took part in the offline End-to-End English to German TED lectures translation task. We based our solution on our last year’s submission. We used a slightly altered Transformer architecture with ResNet-like convolutional layer preparing the audio input to Transformer encoder. To improve the model’s quality of translation we introduced two regularization techniques and trained on machine translated Librispeech corpus in addition to iwslt-corpus, TEDLIUM2 andMust_C corpora. Our best model scored almost 3 BLEU higher than last year’s model. To segment 2020 test set we used exactly the same procedure as last year.
This paper describes the submission to the WMT20 shared news translation task by Samsung R&D Institute Poland. We submitted systems for six language directions: English to Czech, Czech to English, English to Polish, Polish to English, English to Inuktitut and Inuktitut to English. For each, we trained a single-direction model. However, directions including English, Polish and Czech were derived from a common multilingual base, which was later fine-tuned on each particular direction. For all the translation directions, we used a similar training regime, with iterative training corpora improvement through back-translation and model ensembling. For the En → Cs direction, we additionally leveraged document-level information by re-ranking the beam output with a separate model.
This paper describes the joint submission to the IWSLT 2019 English to Czech task by Samsung RD Institute, Poland, and the University of Edinburgh. Our submission was ultimately produced by combining four Transformer systems through a mixture of ensembling and reranking.
This paper describes the submission to IWSLT 2019 End- to-End speech translation task by Samsung R&D Institute, Poland. We decided to focus on end-to-end English to German TED lectures translation and did not provide any submission for other speech tasks. We used a slightly altered Transformer architecture with standard convolutional layer preparing the audio input to Transformer en- coder. Additionally, we propose an audio segmentation al- gorithm maximizing BLEU score on tst2015 test set.
This paper describes the joint submission to the IWSLT 2018 Low Resource MT task by Samsung R&D Institute, Poland, and the University of Edinburgh. We focused on supplementing the very limited in-domain Basque-English training data with out-of-domain data, with synthetic data, and with data for other language pairs. We also experimented with a variety of model architectures and features, which included the development of extensions to the Nematus toolkit. Our submission was ultimately produced by a system combination in which we reranked translations from our strongest individual system using multiple weaker systems.
This paper describes the joint submission of Samsung Research and Development, Warsaw, Poland and the University of Edinburgh team to the IWSLT MT task for TED talks. We took part in two translation directions, en-de and de-en. We also participated in the en-de and de-en lectures SLT task. The models have been trained with an attentional encoder-decoder model using the BiDeep model in Nematus. We filtered the training data to reduce the problem of noisy data, and we use back-translated monolingual data for domain-adaptation. We demonstrate the effectiveness of the different techniques that we applied via ablation studies. Our submission system outperforms our baseline, and last year’s University of Edinburgh submission to IWSLT, by more than 5 BLEU.