Abstract
Most text-classification approaches represent the input based on textual features, either feature-based or continuous. However, this ignores strong non-linguistic similarities like homophily: people within a demographic group use language more similar to each other than to non-group members. We use homophily cues to retrofit text-based author representations with non-linguistic information, and introduce a trade-off parameter. This approach increases in-class similarity between authors, and improves classification performance by making classes more linearly separable. We evaluate the effect of our method on two author-attribute prediction tasks with various training-set sizes and parameter settings. We find that our method can significantly improve classification performance, especially when the number of labels is large and limited labeled data is available. It is potentially applicable as preprocessing step to any text-classification task.- Anthology ID:
- D18-1070
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 671–677
- Language:
- URL:
- https://aclanthology.org/D18-1070
- DOI:
- 10.18653/v1/D18-1070
- Cite (ACL):
- Dirk Hovy and Tommaso Fornaciari. 2018. Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 671–677, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information (Hovy & Fornaciari, EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/D18-1070.pdf
- Code
- Bocconi-NLPLab/retrofit_attributes