Abstract
We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.- Anthology ID:
- W18-3008
- Volume:
- Proceedings of the Third Workshop on Representation Learning for NLP
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venue:
- RepL4NLP
- SIG:
- SIGREP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 64–68
- Language:
- URL:
- https://aclanthology.org/W18-3008
- DOI:
- 10.18653/v1/W18-3008
- Cite (ACL):
- Lidia Pivovarova and Roman Yangarber. 2018. Comparison of Representations of Named Entities for Document Classification. In Proceedings of the Third Workshop on Representation Learning for NLP, pages 64–68, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Comparison of Representations of Named Entities for Document Classification (Pivovarova & Yangarber, RepL4NLP 2018)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/W18-3008.pdf
- Data
- RCV1