Exploring Optimal Voting in Native Language Identification

Cyril Goutte, Serge Léger


Abstract
We describe the submissions entered by the National Research Council Canada in the NLI-2017 evaluation. We mainly explored the use of voting, and various ways to optimize the choice and number of voting systems. We also explored the use of features that rely on no linguistic preprocessing. Long ngrams of characters obtained from raw text turned out to yield the best performance on all textual input (written essays and speech transcripts). Voting ensembles turned out to produce small performance gains, with little difference between the various optimization strategies we tried. Our top systems achieved accuracies of 87% on the essay track, 84% on the speech track, and close to 92% by combining essays, speech and i-vectors in the fusion track.
Anthology ID:
W17-5041
Volume:
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
367–373
Language:
URL:
https://aclanthology.org/W17-5041
DOI:
10.18653/v1/W17-5041
Bibkey:
Cite (ACL):
Cyril Goutte and Serge Léger. 2017. Exploring Optimal Voting in Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 367–373, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Exploring Optimal Voting in Native Language Identification (Goutte & Léger, BEA 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/W17-5041.pdf