Abstract
This paper describes the results of NILC team at CWI 2018. We developed solutions following three approaches: (i) a feature engineering method using lexical, n-gram and psycholinguistic features, (ii) a shallow neural network method using only word embeddings, and (iii) a Long Short-Term Memory (LSTM) language model, which is pre-trained on a large text corpus to produce a contextualized word vector. The feature engineering method obtained our best results for the classification task and the LSTM model achieved the best results for the probabilistic classification task. Our results show that deep neural networks are able to perform as well as traditional machine learning methods using manually engineered features for the task of complex word identification in English.- Anthology ID:
- W18-0540
- Volume:
- Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Editors:
- Joel Tetreault, Jill Burstein, Ekaterina Kochmar, Claudia Leacock, Helen Yannakoudakis
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 335–340
- Language:
- URL:
- https://aclanthology.org/W18-0540
- DOI:
- 10.18653/v1/W18-0540
- Cite (ACL):
- Nathan Hartmann and Leandro Borges dos Santos. 2018. NILC at CWI 2018: Exploring Feature Engineering and Feature Learning. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 335–340, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- NILC at CWI 2018: Exploring Feature Engineering and Feature Learning (Hartmann & dos Santos, BEA 2018)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/W18-0540.pdf
- Data
- Billion Word Benchmark, BookCorpus