A Multiform Balanced Dependency Treebank for Romanian

Mihaela Colhon, Cătălina Mărănduc, Cătălin Mititelu


Abstract
The UAIC-RoDia-DepTb is a balanced treebank, containing texts in non-standard language: 2,575 chats sentences, old Romanian texts (a Gospel printed in 1648, a codex of laws printed in 1818, a novel written in 1910), regional popular poetry, legal texts, Romanian and foreign fiction, quotations. The proportions are comparable; each of these types of texts is represented by subsets of at least 1,000 phrases, so that the parser can be trained on their peculiarities. The annotation of the treebank started in 2007, and it has classical tags, such as those in school grammar, with the intention of using the resource for didactic purposes. The classification of circumstantial modifiers is rich in semantic information. We present in this paper the development in progress of this resource which has been automatically annotated and entirely manually corrected. We try to add new texts, and to make it available in more formats, by keeping all the morphological and syntactic information annotated, and adding logical-semantic information. We will describe here two conversions, from the classic syntactic format into Universal Dependencies format and into a logical-semantic layer, which will be shortly presented.
Anthology ID:
W17-7802
Volume:
Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Inc.
Note:
Pages:
9–18
Language:
URL:
https://doi.org/10.26615/978-954-452-040-3_002
DOI:
10.26615/978-954-452-040-3_002
Bibkey:
Cite (ACL):
Mihaela Colhon, Cătălina Mărănduc, and Cătălin Mititelu. 2017. A Multiform Balanced Dependency Treebank for Romanian. In Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017, pages 9–18, Varna. INCOMA Inc..
Cite (Informal):
A Multiform Balanced Dependency Treebank for Romanian (Colhon et al., 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-040-3_002