Salvatore Scarlata
2020
The Treebank of Vedic Sanskrit
Oliver Hellwig
|
Salvatore Scarlata
|
Elia Ackermann
|
Paul Widmer
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper introduces the first treebank of Vedic Sanskrit, a morphologically rich ancient Indian language that is of central importance for linguistic and historical research. The selection of the more than 3,700 sentences contained in this treebank reflects the development of metrical and prose texts over a period of 600 years. We discuss how these sentences are annotated in the Universal Dependencies scheme and which syntactic constructions required special attention. In addition, we describe a syntactic labeler based on neural networks that supports the initial annotation of the treebank, and whose evaluation can be helpful for setting up a full syntactic parser of Vedic Sanskrit.