Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners
Lianet Sepúlveda Torres, Magali Sanches Duran, Sandra Aluísio
Abstract
Portuguese is a less resourced language in what concerns foreign language learning. Aiming to inform a module of a system designed to support scientific written production of Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing language transfer errors made by such foreign learners. Each item of the artificially generated lexicon contains, besides the wrong word, the respective Spanish and Portuguese correct words. The wrong word is used to identify the interlanguage error and the correct Spanish and Portuguese forms are used to generate the suggestions. Keeping control of the correct word forms, we can provide correction or, at least, useful suggestions for the learners. We propose to combine two automatic procedures to obtain the error correction: i) a similarity measure and ii) a translation algorithm based on aligned parallel corpus. The similarity-based method achieved a precision of 52%, whereas the alignment-based method achieved a precision of 90%. In this paper we focus only on interlanguage errors involving suffixes that have different forms in both languages. The approach, however, is very promising to tackle other types of errors, such as gender errors.- Anthology ID:
- L14-1231
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3952–3957
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/247_Paper.pdf
- DOI:
- Cite (ACL):
- Lianet Sepúlveda Torres, Magali Sanches Duran, and Sandra Aluísio. 2014. Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3952–3957, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners (Sepúlveda Torres et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/247_Paper.pdf