NILC at CWI 2018: Exploring Feature Engineering and Feature Learning

Nathan Hartmann, Leandro Borges dos Santos


Abstract
This paper describes the results of NILC team at CWI 2018. We developed solutions following three approaches: (i) a feature engineering method using lexical, n-gram and psycholinguistic features, (ii) a shallow neural network method using only word embeddings, and (iii) a Long Short-Term Memory (LSTM) language model, which is pre-trained on a large text corpus to produce a contextualized word vector. The feature engineering method obtained our best results for the classification task and the LSTM model achieved the best results for the probabilistic classification task. Our results show that deep neural networks are able to perform as well as traditional machine learning methods using manually engineered features for the task of complex word identification in English.
Anthology ID:
W18-0540
Volume:
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
335–340
Language:
URL:
https://aclanthology.org/W18-0540
DOI:
10.18653/v1/W18-0540
Bibkey:
Cite (ACL):
Nathan Hartmann and Leandro Borges dos Santos. 2018. NILC at CWI 2018: Exploring Feature Engineering and Feature Learning. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 335–340, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
NILC at CWI 2018: Exploring Feature Engineering and Feature Learning (Hartmann & dos Santos, BEA 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/W18-0540.pdf
Data
Billion Word BenchmarkBookCorpus