Language Technologies for Low Resource Languages: Sociolinguistic and Multilingual Insights

A. Seza Doğruöz, Sunayana Sitaram


Abstract
There is a growing interest in building language technologies (LTs) for low resource languages (LRLs). However, there are flaws in the planning, data collection and development phases mostly due to the assumption that LRLs are similar to High Resource Languages (HRLs) but only smaller in size. In our paper, we first provide examples of failed LTs for LRLs and provide the reasons for these failures. Second, we discuss the problematic issues with the data for LRLs. Finally, we provide recommendations for building better LTs for LRLs through insights from sociolinguistics and multilingualism. Our goal is not to solve all problems around LTs for LRLs but to raise awareness about the existing issues, provide recommendations toward possible solutions and encourage collaboration across academic disciplines for developing LTs that actually serve the needs and preferences of the LRL communities.
Anthology ID:
2022.sigul-1.12
Volume:
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venue:
SIGUL
SIG:
SIGUL
Publisher:
European Language Resources Association
Note:
Pages:
92–97
Language:
URL:
https://aclanthology.org/2022.sigul-1.12
DOI:
Bibkey:
Cite (ACL):
A. Seza Doğruöz and Sunayana Sitaram. 2022. Language Technologies for Low Resource Languages: Sociolinguistic and Multilingual Insights. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, pages 92–97, Marseille, France. European Language Resources Association.
Cite (Informal):
Language Technologies for Low Resource Languages: Sociolinguistic and Multilingual Insights (Doğruöz & Sitaram, SIGUL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2022.sigul-1.12.pdf