Clayton Greenberg

2018

pdf abs
Inducing a Lexicon of Abusive Words – a Feature-Based Approach
Michael Wiegand | Josef Ruppenhofer | Anna Schmidt | Clayton Greenberg
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the cross-domain detection of abusive microposts.

2016

pdf
Long-Short Range Context Neural Networks for Language Modeling
Youssef Oualil | Mittul Singh | Clayton Greenberg | Dietrich Klakow
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices
Yoav Binoun | Francesca Delogu | Clayton Greenberg | Mindaugas Mozuraitis | Matthew Crocker
Proceedings of the NAACL Student Research Workshop

pdf
Thematic fit evaluation: an aspect of selectional preferences
Asad Sayeed | Clayton Greenberg | Vera Demberg
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

pdf abs
Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling
Mittul Singh | Clayton Greenberg | Youssef Oualil | Dietrich Klakow
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings. Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.