Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank


Abstract
Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.
Anthology ID:
P18-2061
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
383–389
Language:
URL:
https://aclanthology.org/P18-2061
DOI:
10.18653/v1/P18-2061
Bibkey:
Cite (ACL):
Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, and Barbara Plank. 2018. Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 383–389, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Bleaching Text: Abstract Features for Cross-lingual Gender Prediction (van der Goot et al., ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/P18-2061.pdf
Presentation:
 P18-2061.Presentation.pdf
Video:
 https://vimeo.com/285803988
Code
 bplank/bleaching-text