Abstract
There is a growing interest in building language technologies (LTs) for low resource languages (LRLs). However, there are flaws in the planning, data collection and development phases mostly due to the assumption that LRLs are similar to High Resource Languages (HRLs) but only smaller in size. In our paper, we first provide examples of failed LTs for LRLs and provide the reasons for these failures. Second, we discuss the problematic issues with the data for LRLs. Finally, we provide recommendations for building better LTs for LRLs through insights from sociolinguistics and multilingualism. Our goal is not to solve all problems around LTs for LRLs but to raise awareness about the existing issues, provide recommendations toward possible solutions and encourage collaboration across academic disciplines for developing LTs that actually serve the needs and preferences of the LRL communities.- Anthology ID:
- 2022.sigul-1.12
- Volume:
- Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Maite Melero, Sakriani Sakti, Claudia Soria
- Venue:
- SIGUL
- SIG:
- SIGUL
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 92–97
- Language:
- URL:
- https://aclanthology.org/2022.sigul-1.12
- DOI:
- Cite (ACL):
- A. Seza Doğruöz and Sunayana Sitaram. 2022. Language Technologies for Low Resource Languages: Sociolinguistic and Multilingual Insights. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, pages 92–97, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Language Technologies for Low Resource Languages: Sociolinguistic and Multilingual Insights (Doğruöz & Sitaram, SIGUL 2022)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2022.sigul-1.12.pdf