Amita Misra


Evaluating Machine Translation in Cross-lingual E-Commerce Search
Hang Zhang | Liling Tan | Amita Misra
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Multilingual query localization is integral to modern e-commerce. While machine translation is widely used to translate e-commerce queries, evaluation of query translation in the context of the down-stream search task is overlooked. This study proposes a search ranking-based evaluation framework with an edit-distance based search metric to evaluate machine translation impact on cross-lingual information retrieval for e-commerce search query translation, The framework demonstrate evaluation of machine translation for e-commerce search at scale and the proposed metric is strongly associated with traditional machine translation and traditional search relevance-based metrics.

Machine translation impact in E-commerce multilingual search
Bryan Zhang | Amita Misra
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Previous work suggests that performance of cross-lingual information retrieval correlates highly with the quality of Machine Translation. However, there may be a threshold beyond which improving query translation quality yields little or no benefit to further improve the retrieval performance. This threshold may depend upon multiple factors including the source and target languages, the existing MT system quality and the search pipeline. In order to identify the benefit of improving an MT system for a given search pipeline, we investigate the sensitivity of retrieval quality to the presence of different levels of MT quality using experimental datasets collected from actual traffic. We systematically improve the performance of our MT systems quality on language pairs as measured by MT evaluation metrics including Bleu and Chrf to determine their impact on search precision metrics and extract signals that help to guide the improvement strategies. Using this information we develop techniques to compare query translations for multiple language pairs and identify the most promising language pairs to invest and improve.


Accountable Error Characterization
Amita Misra | Zhe Liu | Jalal Mahmud
Proceedings of the First Workshop on Trustworthy Natural Language Processing

Customers of machine learning systems demand accountability from the companies employing these algorithms for various prediction tasks. Accountability requires understanding of system limit and condition of erroneous predictions, as customers are often interested in understanding the incorrect predictions, and model developers are absorbed in finding methods that can be used to get incremental improvements to an existing system. Therefore, we propose an accountable error characterization method, AEC, to understand when and where errors occur within the existing black-box models. AEC, as constructed with human-understandable linguistic features, allows the model developers to automatically identify the main sources of errors for a given classification system. It can also be used to sample for the set of most informative input points for a next round of training. We perform error detection for a sentiment analysis task using AEC as a case study. Our results on the sample sentiment task show that AEC is able to characterize erroneous predictions into human understandable categories and also achieves promising results on selecting erroneous samples when compared with the uncertainty-based sampling.


Using Structured Representation and Data: A Hybrid Model for Negation and Sentiment in Customer Service Conversations
Amita Misra | Mansurul Bhuiyan | Jalal Mahmud | Saurabh Tripathy
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Twitter customer service interactions have recently emerged as an effective platform to respond and engage with customers. In this work, we explore the role of ”negation” in customer service interactions, particularly applied to sentiment analysis. We define rules to identify true negation cues and scope more suited to conversational data than existing general review data. Using semantic knowledge and syntactic structure from constituency parse trees, we propose an algorithm for scope detection that performs comparable to state of the art BiLSTM. We further investigate the results of negation scope detection for the sentiment prediction task on customer service conversation data using both a traditional SVM and a Neural Network. We propose an antonym dictionary based method for negation applied to a combination CNN-LSTM for sentiment analysis. Experimental results show that the antonym-based method outperforms the previous lexicon-based and Neural Network methods.


SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems
Kevin Bowden | Jiaqi Wu | Shereen Oraby | Amita Misra | Marilyn Walker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


Are you serious?: Rhetorical Questions and Sarcasm in Social Media Dialog
Shereen Oraby | Vrindavan Harrison | Amita Misra | Ellen Riloff | Marilyn Walker
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Effective models of social dialog must understand a broad range of rhetorical and figurative devices. Rhetorical questions (RQs) are a type of figurative language whose aim is to achieve a pragmatic goal, such as structuring an argument, being persuasive, emphasizing a point, or being ironic. While there are computational models for other forms of figurative language, rhetorical questions have received little attention to date. We expand a small dataset from previous work, presenting a corpus of 10,270 RQs from debate forums and Twitter that represent different discourse functions. We show that we can clearly distinguish between RQs and sincere questions (0.76 F1). We then show that RQs can be used both sarcastically and non-sarcastically, observing that non-sarcastic (other) uses of RQs are frequently argumentative in forums, and persuasive in tweets. We present experiments to distinguish between these uses of RQs using SVM and LSTM models that represent linguistic features and post-level context, achieving results as high as 0.76 F1 for “sarcastic” and 0.77 F1 for “other” in forums, and 0.83 F1 for both “sarcastic” and “other” in tweets. We supplement our quantitative experiments with an in-depth characterization of the linguistic variation in RQs.


Measuring the Similarity of Sentential Arguments in Dialogue
Amita Misra | Brian Ecker | Marilyn Walker
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

NLDS-UCSC at SemEval-2016 Task 6: A Semi-Supervised Approach to Detecting Stance in Tweets
Amita Misra | Brian Ecker | Theodore Handleman | Nicolas Hahn | Marilyn Walker
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


Using Summarization to Discover Argument Facets in Online Idealogical Dialog
Amita Misra | Pranav Anand | Jean E. Fox Tree | Marilyn Walker
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


Topic Independent Identification of Agreement and Disagreement in Social Media Dialogue
Amita Misra | Marilyn Walker
Proceedings of the SIGDIAL 2013 Conference