Abstract
Treebanks are an essential resource for syntactic parsing. The available Paninian dependency treebank(s) for Telugu is annotated only with inter-chunk dependency relations and not all words of a sentence are part of the parse tree. In this paper, we automatically annotate the intra-chunk dependencies in the treebank using a Shift-Reduce parser based on Context Free Grammar rules for Telugu chunks. We also propose a few additional intra-chunk dependency relations for Telugu apart from the ones used in Hindi treebank. Annotating intra-chunk dependencies finally provides a complete parse tree for every sentence in the treebank. Having a fully expanded treebank is crucial for developing end to end parsers which produce complete trees. We present a fully expanded dependency treebank for Telugu consisting of 3220 sentences. In this paper, we also convert the treebank annotated with Anncorra part-of-speech tagset to the latest BIS tagset. The BIS tagset is a hierarchical tagset adopted as a unified part-of-speech standard across all Indian Languages. The final treebank is made publicly available.- Anthology ID:
- 2020.wildre-1.8
- Volume:
- Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- WILDRE
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 39–44
- Language:
- English
- URL:
- https://aclanthology.org/2020.wildre-1.8
- DOI:
- Cite (ACL):
- Sneha Nallani, Manish Shrivastava, and Dipti Sharma. 2020. A Fully Expanded Dependency Treebank for Telugu. In Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, pages 39–44, Marseille, France. European Language Resources Association (ELRA).
- Cite (Informal):
- A Fully Expanded Dependency Treebank for Telugu (Nallani et al., WILDRE 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.wildre-1.8.pdf