Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

Dirk Hovy, Tommaso Fornaciari


Abstract
Most text-classification approaches represent the input based on textual features, either feature-based or continuous. However, this ignores strong non-linguistic similarities like homophily: people within a demographic group use language more similar to each other than to non-group members. We use homophily cues to retrofit text-based author representations with non-linguistic information, and introduce a trade-off parameter. This approach increases in-class similarity between authors, and improves classification performance by making classes more linearly separable. We evaluate the effect of our method on two author-attribute prediction tasks with various training-set sizes and parameter settings. We find that our method can significantly improve classification performance, especially when the number of labels is large and limited labeled data is available. It is potentially applicable as preprocessing step to any text-classification task.
Anthology ID:
D18-1070
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
671–677
Language:
URL:
https://aclanthology.org/D18-1070
DOI:
10.18653/v1/D18-1070
Bibkey:
Cite (ACL):
Dirk Hovy and Tommaso Fornaciari. 2018. Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 671–677, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information (Hovy & Fornaciari, EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/D18-1070.pdf
Video:
 https://vimeo.com/306361301
Code
 Bocconi-NLPLab/retrofit_attributes