Radityo Eko Prasojo


2021

pdf bib
ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair
Alham Fikri Aji | Tirana Fatyanosa | Radityo Eko Prasojo | Philip Arthur | Suci Fitriany | Salma Qonitah | Nadhifa Zulfa | Tomi Santoso | Mahendra Data
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib
IndoCollex: A Testbed for Morphological Transformation of Indonesian Colloquial Words
Haryo Akbarianto Wibowo | Made Nindyatama Nityasya | Afra Feyza Akyürek | Suci Fitriany | Alham Fikri Aji | Radityo Eko Prasojo | Derry Tanti Wijaya
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
BERT Goes Brrr: A Venture Towards the Lesser Error in Classifying Medical Self-Reporters on Twitter
Alham Fikri Aji | Made Nindyatama Nityasya | Haryo Akbarianto Wibowo | Radityo Eko Prasojo | Tirana Fatyanosa
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

This paper describes our team’s submission for the Social Media Mining for Health (SMM4H) 2021 shared task. We participated in three subtasks: Classifying adverse drug effect, COVID-19 self-report, and COVID-19 symptoms. Our system is based on BERT model pre-trained on the domain-specific text. In addition, we perform data cleaning and augmentation, as well as hyperparameter optimization and model ensemble to further boost the BERT performance. We achieved the first rank in both classifying adverse drug effects and COVID-19 self-report tasks.

2020

pdf bib
Benchmarking Multidomain English-Indonesian Machine Translation
Tri Wahyu Guntara | Alham Fikri Aji | Radityo Eko Prasojo
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

In the context of Machine Translation (MT) from-and-to English, Bahasa Indonesia has been considered a low-resource language, and therefore applying Neural Machine Translation (NMT) which typically requires large training dataset proves to be problematic. In this paper, we show otherwise by collecting large, publicly-available datasets from the Web, which we split into several domains: news, religion, general, and conversation, to train and benchmark some variants of transformer-based NMT models across the domains. We show using BLEU that our models perform well across them , outperform the baseline Statistical Machine Translation (SMT) models, and perform comparably with Google Translate. Our datasets (with the standard split for training, validation, and testing), code, and models are available on https://github.com/gunnxx/indonesian-mt-data