Sabela Prieto González
2010
Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora
Margarita Alonso Ramos
|
Leo Wanner
|
Orsolya Vincze
|
Gerard Casamayor del Bosque
|
Nancy Vázquez Veiga
|
Estela Mosqueira Suárez
|
Sabela Prieto González
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Collocations play a significant role in second language acquisition. In order to be able to offer efficient support to learners, an NLP-based CALL environment for learning collocations should be based on a representative collocation error annotated learner corpus. However, so far, no theoretically-motivated collocation error tag set is available. Existing learner corpora tag collocation errors simply as lexical errors ― which is clearly insufficient given the wide range of different collocation errors that the learners make. In this paper, we present a fine-grained three-dimensional typology of collocation errors that has been derived in an empirical study from the learner corpus CEDEL2 compiled by a team at the Autonomous University of Madrid. The first dimension captures whether the error concerns the collocation as a whole or one of its elements; the second dimension captures the language-oriented error analysis, while the third exemplifies the interpretative error analysis. To facilitate a smooth annotation along this typology, we adapted Knowtator, a flexible off-the-shelf annotation tool implemented as a Protégé plugin.