Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank

[How to correct problems with metadata yourself]


Abstract
Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.
Anthology ID:
P18-2061
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
383–389
Language:
URL:
https://aclanthology.org/P18-2061
DOI:
10.18653/v1/P18-2061
Bibkey:
Cite (ACL):
Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, and Barbara Plank. 2018. Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 383–389, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Bleaching Text: Abstract Features for Cross-lingual Gender Prediction (van der Goot et al., ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/P18-2061.pdf
Presentation:
 P18-2061.Presentation.pdf
Video:
 https://preview.aclanthology.org/teach-a-man-to-fish/P18-2061.mp4
Code
 bplank/bleaching-text