Mark Carman

Also published as: Mark J. Carman, Mark James Carman

2018

pdf
Sarcasm Target Identification: Dataset and An Introductory Approach
Aditya Joshi | Pranav Goel | Pushpak Bhattacharyya | Mark Carman
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
Efficient Benchmarking of NLP APIs using Multi-armed Bandits
Gholamreza Haffari | Tuan Dung Tran | Mark Carman
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Comparing NLP systems to select the best one for a task of interest, such as named entity recognition, is critical for practitioners and researchers. A rigorous approach involves setting up a hypothesis testing scenario using the performance of the systems on query documents. However, often the hypothesis testing approach needs to send a lot of document queries to the systems, which can be problematic. In this paper, we present an effective alternative based on the multi-armed bandit (MAB). We propose a hierarchical generative model to represent the uncertainty in the performance measures of the competing systems, to be used by Thompson Sampling to solve the resulting MAB. Experimental results on both synthetic and real data show that our approach requires significantly fewer queries compared to the standard benchmarking technique to identify the best system according to F-measure.

2016

pdf
Are Word Embedding-based Features Useful for Sarcasm Detection?
Aditya Joshi | Vaibhav Tripathi | Kevin Patel | Pushpak Bhattacharyya | Mark Carman
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
How Challenging is Sarcasm versus Irony Classification?: A Study With a Dataset from English Literature
Aditya Joshi | Vaibhav Tripathi | Pushpak Bhattacharyya | Mark Carman | Meghna Singh | Jaya Saraswati | Rajita Shukla
Proceedings of the Australasian Language Technology Association Workshop 2016

pdf
Political Issue Extraction Model: A Novel Hierarchical Topic Model That Uses Tweets By Political And Non-Political Authors
Aditya Joshi | Pushpak Bhattacharyya | Mark Carman
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf
How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text
Aditya Joshi | Pushpak Bhattacharyya | Mark Carman | Jaya Saraswati | Rajita Shukla
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib abs
‘Who would have thought of that!’: A Hierarchical Topic Model for Extraction of Sarcasm-prevalent Topics and Sarcasm Detection
Aditya Joshi | Prayas Jain | Pushpak Bhattacharyya | Mark Carman
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)

Topic Models have been reported to be beneficial for aspect-based sentiment analysis. This paper reports the first topic model for sarcasm detection, to the best of our knowledge. Designed on the basis of the intuition that sarcastic tweets are likely to have a mixture of words of both sentiments as against tweets with literal sentiment (either positive or negative), our hierarchical topic model discovers sarcasm-prevalent topics and topic-level sentiment. Using a dataset of tweets labeled using hashtags, the model estimates topic-level, and sentiment-level distributions. Our evaluation shows that topics such as ‘work’, ‘gun laws’, ‘weather’ are sarcasm-prevalent topics. Our model is also able to discover the mixture of sentiment-bearing words that exist in a text of a given sentiment-related label. Finally, we apply our model to predict sarcasm in tweets. We outperform two prior work based on statistical classifiers with specific features, by around 25%.

pdf abs
That’ll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
Diptesh Kanojia | Aditya Joshi | Pushpak Bhattacharyya | Mark James Carman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence of such lexical resources, multilingual topic models have been used to create coarse lexical resources in the past, using a Cartesian product approach. Our results show that for morphologically rich languages like Hindi, the Cartesian product approach is detrimental for MT. We then present a novel ‘sentential’ approach to use this coarse lexical resource from a multilingual topic model. Our coarse lexical resource when injected with a parallel corpus outperforms a system trained using parallel corpus and a good quality lexical resource. As demonstrated by the quality of our coarse lexical resource and its benefit to MT, we believe that our sentential approach to create such a resource will help MT for resource-constrained languages.

pdf
Harnessing Sequence Labeling for Sarcasm Detection in Dialogue from TV Series ‘Friends’
Aditya Joshi | Vaibhav Tripathi | Pushpak Bhattacharyya | Mark J. Carman
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

Mark Carman

2018

2017

2016

2015

Co-authors

Venues