CIC-FBK Approach to Native Language Identification
Ilia Markov, Lingzhen Chen, Carlo Strapparava, Grigori Sidorov
Abstract
We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform classification using the Support Vector Machines (SVM) algorithm. Our system achieved 0.8808 macro-averaged F1-score and shared the 1st rank in the NLI Shared Task 2017 scoring.- Anthology ID:
- W17-5042
- Volume:
- Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Joel Tetreault, Jill Burstein, Claudia Leacock, Helen Yannakoudakis
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 374–381
- Language:
- URL:
- https://aclanthology.org/W17-5042
- DOI:
- 10.18653/v1/W17-5042
- Cite (ACL):
- Ilia Markov, Lingzhen Chen, Carlo Strapparava, and Grigori Sidorov. 2017. CIC-FBK Approach to Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 374–381, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- CIC-FBK Approach to Native Language Identification (Markov et al., BEA 2017)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/W17-5042.pdf