Steven Limcorn
2025
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
|
David Samuel Setiawan
|
Steven Limcorn
|
Ananto Joyoadikusumo
Proceedings of the Second Workshop in South East Asian Language Processing
We present NusaBERT, a multilingual model built on IndoBERT and tailored for Indonesia’s diverse languages. By expanding vocabulary and pre-training on a regional corpus, NusaBERT achieves state-of-the-art performance on Indonesian NLU benchmarks, enhancing IndoBERT’s multilingual capability. This study also addresses NusaBERT’s limitations and encourages further research on Indonesia’s underrepresented languages.
Lazarus NLP at SemEval-2025 Task 11: Fine-Tuning Large Language Models for Multi-Label Emotion Classification via Sentence-Label Pairing
Wilson Wongso
|
David Setiawan
|
Ananto Joyoadikusumo
|
Steven Limcorn
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Multi-label emotion classification in low-resource languages remains challenging due to limited annotated data and model adaptability. To address this, we fine-tune large language models (LLMs) using a sentence-label pairing approach, optimizing efficiency while improving classification performance. Evaluating on Sundanese, Indonesian, and Javanese, our method outperforms conventional classifier-based fine-tuning and achieves strong zero-shot cross-lingual transfer. Notably, our approach ranks first in the Sundanese subset of SemEval-2025 Task 11 Track A. Our findings demonstrate the effectiveness of LLM fine-tuning for low-resource emotion classification, underscoring the importance of tailoring adaptation strategies to specific language families in multilingual contexts.