Abstract
The increasing scale of large language models has led some students to wonder what contributions can be made in academia. However, students are often unaware that LLM-based approaches are not feasible for the majority of the world’s languages due to lack of data availability. This paper presents a research project in which students explore the issue of language representation by creating an inventory of the data, preprocessing, and model resources available for a less-resourced language. Students are put into small groups and assigned a language to research. Within the group, students take on one of three roles: dataset investigator, preprocessing investigator, or downstream task investigator. Students then work together to create a 7-page research report about their language.- Anthology ID:
- 2024.teachingnlp-1.14
- Volume:
- Proceedings of the Sixth Workshop on Teaching NLP
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Sana Al-azzawi, Laura Biester, György Kovács, Ana Marasović, Leena Mathur, Margot Mieskes, Leonie Weissweiler
- Venues:
- TeachingNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 91–93
- Language:
- URL:
- https://aclanthology.org/2024.teachingnlp-1.14
- DOI:
- Cite (ACL):
- Carolyn Anderson. 2024. Exploring Language Representation through a Resource Inventory Project. In Proceedings of the Sixth Workshop on Teaching NLP, pages 91–93, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Exploring Language Representation through a Resource Inventory Project (Anderson, TeachingNLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.teachingnlp-1.14.pdf