Initial Experiments for Building a Guarani WordNet
Luis Chiruzzo, Marvin Agüero-Torales, Aldo Alvarez, Yliana Rodríguez
Abstract
This paper presents a work in progress about creating a Guarani version of the WordNet database. Guarani is an indigenous South American language and is a low-resource language from the NLP perspective. Following the expand approach, we aim to find Guarani lemmas that correspond to the concepts defined in WordNet. We do this through three strategies that try to select the correct lemmas from Guarani-Spanish datasets. We ran them through three different bilingual dictionaries and had native speakers assess the results. This procedure found Guarani lemmas for about 6.5 thousand synsets, including 27% of the base WordNet concepts. However, more work on the quality of the selected words will be needed in order to create a final version of the dataset.- Anthology ID:
- 2023.gwc-1.24
- Volume:
- Proceedings of the 12th Global Wordnet Conference
- Month:
- January
- Year:
- 2023
- Address:
- University of the Basque Country, Donostia - San Sebastian, Basque Country
- Editors:
- German Rigau, Francis Bond, Alexandre Rademaker
- Venue:
- GWC
- SIG:
- Publisher:
- Global Wordnet Association
- Note:
- Pages:
- 197–204
- Language:
- URL:
- https://aclanthology.org/2023.gwc-1.24
- DOI:
- Cite (ACL):
- Luis Chiruzzo, Marvin Agüero-Torales, Aldo Alvarez, and Yliana Rodríguez. 2023. Initial Experiments for Building a Guarani WordNet. In Proceedings of the 12th Global Wordnet Conference, pages 197–204, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.
- Cite (Informal):
- Initial Experiments for Building a Guarani WordNet (Chiruzzo et al., GWC 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2023.gwc-1.24.pdf