Abstract
Neural network approaches to Named-Entity Recognition reduce the need for carefully hand-crafted features. While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers. In this work, we show that this is unfair: lexical features are actually quite useful. We propose to embed words and entity types into a low-dimensional vector space we train from annotated data produced by distant supervision thanks to Wikipedia. From this, we compute — offline — a feature vector representing each word. When used with a vanilla recurrent neural network model, this representation yields substantial improvements. We establish a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0, while matching state-of-the-art performance with a F1 score of 91.73 on the over-studied CONLL-2003 dataset.- Anthology ID:
- C18-1161
- Volume:
- Proceedings of the 27th International Conference on Computational Linguistics
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Emily M. Bender, Leon Derczynski, Pierre Isabelle
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1896–1907
- Language:
- URL:
- https://aclanthology.org/C18-1161
- DOI:
- Cite (ACL):
- Abbas Ghaddar and Phillippe Langlais. 2018. Robust Lexical Features for Improved Neural Network Named-Entity Recognition. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1896–1907, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Robust Lexical Features for Improved Neural Network Named-Entity Recognition (Ghaddar & Langlais, COLING 2018)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/C18-1161.pdf
- Data
- CoNLL, CoNLL 2003, DBpedia, OntoNotes 5.0