Jann Railey Montalan

Also published as: Jann Montalan, Railey Montalan, Jann Railey E. Montalan


2024

pdf bib
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia | Rahmad Mahendra | Salsabil Maulana Akbar | Lester James V. Miranda | Jennifer Santoso | Elyanah Aco | Akhdan Fadhilah | Jonibek Mansurov | Joseph Marvin Imperial | Onno P. Kampman | Joel Ruben Antony Moniz | Muhammad Ravi Shulthan Habibi | Frederikus Hudi | Railey Montalan | Ryan Ignatius | Joanito Agili Lopo | William Nixon | Börje F. Karlsson | James Jaya | Ryandito Diandaru | Yuze Gao | Patrick Amadeus | Bin Wang | Jan Christian Blaise Cruz | Chenxi Whitehouse | Ivan Halim Parmonangan | Maria Khelli | Wenyu Zhang | Lucky Susanto | Reynard Adha Ryanda | Sonny Lazuardi Hermawan | Dan John Velasco | Muhammad Dehan Al Kautsar | Willy Fitra Hendria | Yasmin Moslem | Noah Flynn | Muhammad Farid Adilazuarda | Haochen Li | Johanes Lee | R. Damanhuri | Shuo Sun | Muhammad Reza Qorib | Amirbek Djanibekov | Wei Qi Leong | Quyet V. Do | Niklas Muennighoff | Tanrada Pansuwan | Ilham Firdausi Putra | Yan Xu | Tai Ngee Chia | Ayu Purwarianti | Sebastian Ruder | William Tjhi | Peerat Limkonchotiwat | Alham Fikri Aji | Sedrick Keh | Genta Indra Winata | Ruochen Zhang | Fajri Koto | Zheng-Xin Yong | Samuel Cahyawijaya
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, through a collaborative movement, we introduce SEACrowd, a comprehensive resource center that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in Southeast Asia.

pdf bib
Zero-shot Cross-lingual POS Tagging for Filipino
Jimson Layacan | Isaiah Edri W. Flores | Katrina Tan | Ma. Regina E. Estuar | Jann Railey E. Montalan | Marlene M. De Leon
Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics (Field Matters 2024)

Supervised learning approaches in NLP, exemplified by POS tagging, rely heavily on the presence of large amounts of annotated data. However, acquiring such data often requires significant amount of resources and incurs high costs. In this work, we explore zero-shot cross-lingual transfer learning to address data scarcity issues in Filipino POS tagging, particularly focusing on optimizing source language selection. Our zero-shot approach demonstrates superior performance compared to previous studies, with top-performing fine-tuned PLMs achieving F1 scores as high as 79.10%. The analysis reveals moderate correlations between cross-lingual transfer performance and specific linguistic distances–featural, inventory, and syntactic–suggesting that source languages with these features closer to Filipino provide better results. We identify tokenizer optimization as a key challenge, as PLM tokenization sometimes fails to align with meaningful representations, thus hindering POS tagging performance.

pdf bib
Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino
Jann Railey Montalan | Jian Gang Ngui | Wei Qi Leong | Yosephine Susanto | Hamsawardhini Rengarajan | Alham Fikri Aji | William Chandra Tjhi
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation