NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Linuwih, Bryan Wilie, Galih Muridan, Genta Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung
- Anthology ID:
- 2023.ijcnlp-main.60
- Volume:
- Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- November
- Year:
- 2023
- Address:
- Nusa Dua, Bali
- Editors:
- Jong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, Adila Alfa Krisnadhi
- Venues:
- IJCNLP | AACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 921–945
- Language:
- URL:
- https://aclanthology.org/2023.ijcnlp-main.60
- DOI:
- 10.18653/v1/2023.ijcnlp-main.60
- Cite (ACL):
- Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Linuwih, Bryan Wilie, Galih Muridan, Genta Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, and Pascale Fung. 2023. NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 921–945, Nusa Dua, Bali. Association for Computational Linguistics.
- Cite (Informal):
- NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages (Cahyawijaya et al., IJCNLP-AACL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.ijcnlp-main.60.pdf