2018
pdf
abs
Predicting Foreign Language Usage from English-Only Social Media Posts
Svitlana Volkova
|
Stephen Ranshous
|
Lawrence Phillips
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Social media is known for its multi-cultural and multilingual interactions, a natural product of which is code-mixing. Multilingual speakers mix languages they tweet to address a different audience, express certain feelings, or attract attention. This paper presents a large-scale analysis of 6 million tweets produced by 27 thousand multilingual users speaking 12 other languages besides English. We rely on this corpus to build predictive models to infer non-English languages that users speak exclusively from their English tweets. Unlike native language identification task, we rely on large amounts of informal social media communications rather than ESL essays. We contrast the predictive power of the state-of-the-art machine learning models trained on lexical, syntactic, and stylistic signals with neural network models learned from word, character and byte representations extracted from English only tweets. We report that content, style and syntax are the most predictive of non-English languages that users speak on Twitter. Neural network models learned from byte representations of user content combined with transfer learning yield the best performance. Finally, by analyzing cross-lingual transfer – the influence of non-English languages on various levels of linguistic performance in English, we present novel findings on stylistic and syntactic variations across speakers of 12 languages in social media.
2017
pdf
abs
Intrinsic and Extrinsic Evaluation of Spatiotemporal Text Representations in Twitter Streams
Lawrence Phillips
|
Kyle Shaffer
|
Dustin Arendt
|
Nathan Hodas
|
Svitlana Volkova
Proceedings of the 2nd Workshop on Representation Learning for NLP
Language in social media is a dynamic system, constantly evolving and adapting, with words and concepts rapidly emerging, disappearing, and changing their meaning. These changes can be estimated using word representations in context, over time and across locations. A number of methods have been proposed to track these spatiotemporal changes but no general method exists to evaluate the quality of these representations. Previous work largely focused on qualitative evaluation, which we improve by proposing a set of visualizations that highlight changes in text representation over both space and time. We demonstrate usefulness of novel spatiotemporal representations to explore and characterize specific aspects of the corpus of tweets collected from European countries over a two-week period centered around the terrorist attacks in Brussels in March 2016. In addition, we quantitatively evaluate spatiotemporal representations by feeding them into a downstream classification task – event type prediction. Thus, our work is the first to provide both intrinsic (qualitative) and extrinsic (quantitative) evaluation of text representations for spatiotemporal trends.
2015
pdf
Utility-based evaluation metrics for models of language acquisition: A look at speech segmentation
Lawrence Phillips
|
Lisa Pearl
Proceedings of the 6th Workshop on Cognitive Modeling and Computational Linguistics
2014
pdf
Bayesian inference as a cross-linguistic word segmentation strategy: Always learning useful things
Lawrence Phillips
|
Lisa Pearl
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)