Developing Universal Dependencies Treebanks for Magahi and Braj
Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha
Abstract
In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj - based on the Universal Dependencies framework. The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies. This paper gives a description of the different dependency relationship found in the two languages and give some statistics of the two treebanks. The dataset will be made publicly available on Universal Dependency (UD) repository in the next (v2.10) release.- Anthology ID:
- 2021.pail-1.1
- Volume:
- Proceedings of the First Workshop on Parsing and its Applications for Indian Languages
- Month:
- December
- Year:
- 2021
- Address:
- NIT Silchar, India
- Editors:
- Kengatharaiyer Sarveswaran, Parameswari Krishnamurthy, Pruthwik Mishra
- Venue:
- PAIL
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 1–11
- Language:
- URL:
- https://aclanthology.org/2021.pail-1.1
- DOI:
- Cite (ACL):
- Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, and Atul Kr. Ojha. 2021. Developing Universal Dependencies Treebanks for Magahi and Braj. In Proceedings of the First Workshop on Parsing and its Applications for Indian Languages, pages 1–11, NIT Silchar, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Developing Universal Dependencies Treebanks for Magahi and Braj (Raj et al., PAIL 2021)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2021.pail-1.1.pdf