Gili Goldin


2018

pdf
Native Language Identification with User Generated Content
Gili Goldin | Ella Rabinovich | Shuly Wintner
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We address the task of native language identification in the context of social media content, where authors are highly-fluent, advanced nonnative speakers (of English). Using both linguistically-motivated features and the characteristics of the social media outlet, we obtain high accuracy on this challenging task. We provide a detailed analysis of the features that sheds light on differences between native and nonnative speakers, and among nonnative speakers with different backgrounds.