Abstract
This paper presents the first publicly available UD treebank for Tswana, Tswana-Popapolelo. The data used consists of the 20 Cairo CICLing sentences translated to Tswana. After pre-processing these sentences with detailed POS (XPOS) and converting them to universal POS (UPOS), we proceeded to annotate the data with dependency relations, documenting decisions for the language specific constructions. Linguistic issues encountered are described in detail as this is the first application of the UD framework to produce a dependency treebank for the Bantu language family in general and for Tswana specifically.- Anthology ID:
- 2024.rail-1.7
- Volume:
- Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
- Venues:
- RAIL | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 55–65
- Language:
- URL:
- https://aclanthology.org/2024.rail-1.7
- DOI:
- Cite (ACL):
- Tanja Gaustad, Ansu Berg, Rigardt Pretorius, and Roald Eiselen. 2024. The First Universal Dependency Treebank for Tswana: Tswana-Popapolelo. In Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024, pages 55–65, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- The First Universal Dependency Treebank for Tswana: Tswana-Popapolelo (Gaustad et al., RAIL-WS 2024)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/2024.rail-1.7.pdf