2025
pdf
bib
abs
RBG-AI: Benefits of Multilingual Language Models for Low-Resource Languages
Barathi Ganesh Hb
|
Michal Ptaszynski
Proceedings of the Tenth Conference on Machine Translation
This paper investigates how multilingual language models benefit low-resource languages through our submission to the WMT 2025 Low-Resource Indic Language Translation shared task. We explore whether languages from related families can effectively support translation for low-resource languages that were absent or underrepresented during model training. Using a quantized multilingual pretrained foundation model, we examine zero-shot translation capabilities and cross-lingual transfer effects across three language families: Tibeto-Burman, Indo-Aryan, and Austroasiatic. Our findings demonstrate that multilingual models failed to leverage linguistic similarities, particularly evidenced within the Tibeto-Burman family. The study provides insights into the practical feasibility of zero-shot translation for low-resource language settings and the role of language family relationships in multilingual model performance.
2024
pdf
bib
abs
nowhash at SemEval-2024 Task 4: Exploiting Fusion of Transformers for Detecting Persuasion Techniques in Multilingual Memes
Abu Nowhash Chowdhury
|
Michal Ptaszynski
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Nowadays, memes are considered one of the most prominent forms of medium to disseminate information on social media. Memes are typically constructed in multilingual settings using visuals with texts. Sometimes people use memes to influence mass audiences through rhetorical and psychological techniques, such as causal oversimplification, name-calling, and smear. It is a challenging task to identify those techniques considering memes’ multimodal characteristics. To address these challenges, SemEval-2024 Task 4 introduced a shared task focusing on detecting persuasion techniques in multilingual memes. This paper presents our participation in subtasks 1 and 2(b). We use a finetuned language-agnostic BERT sentence embedding (LaBSE) model to extract effective contextual features from meme text to address the challenge of identifying persuasion techniques in subtask 1. For subtask 2(b), We finetune the vision transformer and XLM-RoBERTa to extract effective contextual information from meme image and text data. Finally, we unify those features and employ a single feed-forward linear layer on top to obtain the prediction label. Experimental results on the SemEval 2024 Task 4 benchmark dataset manifested the potency of our proposed methods for subtasks 1 and 2(b).
2023
pdf
bib
Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity
Juuso Eronen
|
Michal Ptaszynski
|
Karol Nowakowski
|
Zheng Lin Chia
Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation
pdf
bib
Improving Low-Resource Speech Recognition through Multilingual Fine-Tuning with Language Identifiers and Self-Training
Karol Nowakowski
|
Michal Ptaszynski
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
2020
pdf
bib
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations
Michal Ptaszynski
|
Bartosz Ziolko
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations
pdf
bib
abs
Epistolary Education in 21st Century: A System to Support Composition of E-mails by Students to Superiors in Japanese
Kenji Ryu
|
Michal Ptaszynski
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations
E-mail is a communication tool widely used by people of all ages on the Internet today, often in business and formal situations, especially in Japan. Moreover, Japanese E-mail communication has a set of specific rules taught using specialized guidebooks. E-mail literacy education for many Japanese students is typically provided in a traditional, yet inefficient lecture-based way. We propose a system to support Japanese students in writing E-mails to superiors (teachers, job hunting representatives, etc.). We firstly make an investigation into the importance of formal E-mails in Japan, and what is needed to successfully write a formal E-mail. Next, we develop the system with accordance to those rules. Finally, we evaluated the system twofold. The results, although performed on a small number of samples, were generally positive, and clearly indicated additional ways to improve the system.
2014
pdf
bib
Emotive or Non-emotive: That is The Question
Michal Ptaszynski
|
Fumito Masui
|
Rafal Rzepka
|
Kenji Araki
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
2013
pdf
bib
Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximization
Taisei Nitta
|
Fumito Masui
|
Michal Ptaszynski
|
Yasutomo Kimura
|
Rafal Rzepka
|
Kenji Araki
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
Automatically Annotating A Five-Billion-Word Corpus of Japanese Blogs for Affect and Sentiment Analysis
Michal Ptaszynski
|
Rafal Rzepka
|
Kenji Araki
|
Yoshio Momouchi
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis