2021
pdf
bib
abs
Improvements and Extensions on Metaphor Detection
Weicheng Ma
|
Ruibo Liu
|
Lili Wang
|
Soroush Vosoughi
Proceedings of the 1st Workshop on Understanding Implicit and Underspecified Language
Metaphors are ubiquitous in human language. The metaphor detection task (MD) aims at detecting and interpreting metaphors from written language, which is crucial in natural language understanding (NLU) research. In this paper, we introduce a pre-trained Transformer-based model into MD. Our model outperforms the previous state-of-the-art models by large margins in our evaluations, with relative improvements on the F-1 score from 5.33% to 28.39%. Second, we extend MD to a classification task about the metaphoricity of an entire piece of text to make MD applicable in more general NLU scenes. Finally, we clean up the improper or outdated annotations in one of the MD benchmark datasets and re-benchmark it with our Transformer-based model. This approach could be applied to other existing MD datasets as well, since the metaphoricity annotations in these benchmark datasets may be outdated. Future research efforts are also necessary to build an up-to-date and well-annotated dataset consisting of longer and more complex texts.
pdf
bib
Modulating Language Models with Emotions
Ruibo Liu
|
Jason Wei
|
Chenyan Jia
|
Soroush Vosoughi
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
pdf
bib
abs
Language Model Augmented Relevance Score
Ruibo Liu
|
Jason Wei
|
Soroush Vosoughi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements. Newer metrics such as BERTScore have addressed many weaknesses in prior metrics such as BLEU and ROUGE, which rely on n-gram matching. These newer methods, however, are still limited in that they do not consider the generation context, so they cannot properly reward generated text that is correct but deviates from the given reference. In this paper, we propose Language Model Augmented Relevance Score (MARS), a new context-aware metric for NLG evaluation. MARS leverages off-the-shelf language models, guided by reinforcement learning, to create augmented references that consider both the generation context and available human references, which are then used as additional references to score generated text. Compared with seven existing metrics in three common NLG tasks, MARS not only achieves higher correlation with human reference judgements, but also differentiates well-formed candidates from adversarial samples to a larger degree.
2020
pdf
bib
abs
Multi-resolution Annotations for Emoji Prediction
Weicheng Ma
|
Ruibo Liu
|
Lili Wang
|
Soroush Vosoughi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Emojis are able to express various linguistic components, including emotions, sentiments, events, etc. Predicting the proper emojis associated with text provides a way to summarize the text accurately, and it has been proven to be a good auxiliary task to many Natural Language Understanding (NLU) tasks. Labels in existing emoji prediction datasets are all passage-based and are usually under the multi-class classification setting. However, in many cases, one single emoji cannot fully cover the theme of a piece of text. It is thus useful to infer the part of text related to each emoji. The lack of multi-label and aspect-level emoji prediction datasets is one of the bottlenecks for this task. This paper annotates an emoji prediction dataset with passage-level multi-class/multi-label, and aspect-level multi-class annotations. We also present a novel annotation method with which we generate the aspect-level annotations. The annotations are generated heuristically, taking advantage of the self-attention mechanism in Transformer networks. We validate the annotations both automatically and manually to ensure their quality. We also benchmark the dataset with a pre-trained BERT model.
pdf
bib
abs
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation
Ruibo Liu
|
Guangxuan Xu
|
Chenyan Jia
|
Weicheng Ma
|
Lili Wang
|
Soroush Vosoughi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.
pdf
bib
abs
An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data
Lili Wang
|
Chongyang Gao
|
Jason Wei
|
Weicheng Ma
|
Ruibo Liu
|
Soroush Vosoughi
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
The field of NLP has seen unprecedented achievements in recent years. Most notably, with the advent of large-scale pre-trained Transformer-based language models, such as BERT, there has been a noticeable improvement in text representation. It is, however, unclear whether these improvements translate to noisy user-generated text, such as tweets. In this paper, we present an experimental survey of a wide range of well-known text representation techniques for the task of text clustering on noisy Twitter data. Our results indicate that the more advanced models do not necessarily work best on tweets and that more exploration in this area is needed.