Clayton Greenberg


Inducing a Lexicon of Abusive Words – a Feature-Based Approach
Michael Wiegand | Josef Ruppenhofer | Anna Schmidt | Clayton Greenberg
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the cross-domain detection of abusive microposts.


Long-Short Range Context Neural Networks for Language Modeling
Youssef Oualil | Mittul Singh | Clayton Greenberg | Dietrich Klakow
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices
Yoav Binoun | Francesca Delogu | Clayton Greenberg | Mindaugas Mozuraitis | Matthew Crocker
Proceedings of the NAACL Student Research Workshop

Thematic fit evaluation: an aspect of selectional preferences
Asad Sayeed | Clayton Greenberg | Vera Demberg
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling
Mittul Singh | Clayton Greenberg | Youssef Oualil | Dietrich Klakow
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings. Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.


Improving unsupervised vector-space thematic fit evaluation via role-filler prototype clustering
Clayton Greenberg | Asad Sayeed | Vera Demberg
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Verb polysemy and frequency effects in thematic fit modeling
Clayton Greenberg | Vera Demberg | Asad Sayeed
Proceedings of the 6th Workshop on Cognitive Modeling and Computational Linguistics


Disambiguating prepositional phrase attachment sites with sense information captured in contextualized distributional data
Clayton Greenberg
Proceedings of the ACL 2014 Student Research Workshop