Native Language Identification with User Generated Content

Gili Goldin, Ella Rabinovich, Shuly Wintner


Abstract
We address the task of native language identification in the context of social media content, where authors are highly-fluent, advanced nonnative speakers (of English). Using both linguistically-motivated features and the characteristics of the social media outlet, we obtain high accuracy on this challenging task. We provide a detailed analysis of the features that sheds light on differences between native and nonnative speakers, and among nonnative speakers with different backgrounds.
Anthology ID:
D18-1395
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3591–3601
Language:
URL:
https://aclanthology.org/D18-1395
DOI:
10.18653/v1/D18-1395
Bibkey:
Cite (ACL):
Gili Goldin, Ella Rabinovich, and Shuly Wintner. 2018. Native Language Identification with User Generated Content. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3591–3601, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Native Language Identification with User Generated Content (Goldin et al., EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/D18-1395.pdf
Attachment:
 D18-1395.Attachment.pdf