Alba Milà
2014
Boosting the creation of a treebank
Blanca Arias
|
Núria Bel
|
Mercè Lorente
|
Montserrat Marimón
|
Alba Milà
|
Jorge Vivaldi
|
Muntsa Padró
|
Marina Fomicheva
|
Imanol Larrea
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper we present the results of an ongoing experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by automatically: (i) annotating with a de-lexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000 that were achieved in 4 months, with 2 annotators.
Search
Co-authors
- Blanca Arias 1
- Núria Bel 1
- Mercè Lorente 1
- Montserrat Marimon 1
- Jorge Vivaldi 1
- show all...
Venues
- lrec1