Catarina Magro


2010

pdf
When CORDIAL Becomes Friendly: Endowing the CORDIAL Corpus with a Syntactic Annotation Layer
Catarina Magro
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper reports on the syntactic annotation of a previously compiled and tagged corpus of European Portuguese (EP) dialects ― The Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN). The parsed version of CORDIAL-SIN is intended to be a more efficient resource for the purpose of studying dialect syntax by allowing automated searches for various syntactic constructions of interest. To achieve this goal we adopted a rich annotation system (the UPenn corpora annotation system) which codifies syntactic information of high relevance. The annotation produces tree representations, in form of labelled parenthesis, that are integrally searchable with CorpusSearch, a search engine for parsed corpora (Randall, 2005-2007). The present paper focuses on CORDIAL-SIN annotation issues, namely it presents the general principles and guidelines of the adopted annotation system and describes the methodology for constructing the parsed version of the corpus and for searching it (tools and procedures). Last section addresses the question of how an annotation system originally designed for Middle English can be adapted to meet the particular needs of a Portuguese corpus of dialectal speech.
Search
Co-authors
    Venues