Gerard Casamayor del Bosque


Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora
Margarita Alonso Ramos | Leo Wanner | Orsolya Vincze | Gerard Casamayor del Bosque | Nancy Vázquez Veiga | Estela Mosqueira Suárez | Sabela Prieto González
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Collocations play a significant role in second language acquisition. In order to be able to offer efficient support to learners, an NLP-based CALL environment for learning collocations should be based on a representative collocation error annotated learner corpus. However, so far, no theoretically-motivated collocation error tag set is available. Existing learner corpora tag collocation errors simply as “lexical errors” ― which is clearly insufficient given the wide range of different collocation errors that the learners make. In this paper, we present a fine-grained three-dimensional typology of collocation errors that has been derived in an empirical study from the learner corpus CEDEL2 compiled by a team at the Autonomous University of Madrid. The first dimension captures whether the error concerns the collocation as a whole or one of its elements; the second dimension captures the language-oriented error analysis, while the third exemplifies the interpretative error analysis. To facilitate a smooth annotation along this typology, we adapted Knowtator, a flexible off-the-shelf annotation tool implemented as a Protégé plugin.