SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis
Jie Zhou, Junfeng Tian, Rui Wang, Yuanbin Wu, Wenming Xiao, Liang He
Abstract
Pre-trained language models have been widely applied to cross-domain NLP tasks like sentiment analysis, achieving state-of-the-art performance. However, due to the variety of users’ emotional expressions across domains, fine-tuning the pre-trained models on the source domain tends to overfit, leading to inferior results on the target domain. In this paper, we pre-train a sentiment-aware language model (SentiX) via domain-invariant sentiment knowledge from large-scale review datasets, and utilize it for cross-domain sentiment analysis task without fine-tuning. We propose several pre-training tasks based on existing lexicons and annotations at both token and sentence levels, such as emoticons, sentiment words, and ratings, without human interference. A series of experiments are conducted and the results indicate the great advantages of our model. We obtain new state-of-the-art results in all the cross-domain sentiment analysis tasks, and our proposed SentiX can be trained with only 1% samples (18 samples) and it achieves better performance than BERT with 90% samples.- Anthology ID:
- 2020.coling-main.49
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 568–579
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.coling-main.49/
- DOI:
- 10.18653/v1/2020.coling-main.49
- Cite (ACL):
- Jie Zhou, Junfeng Tian, Rui Wang, Yuanbin Wu, Wenming Xiao, and Liang He. 2020. SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis. In Proceedings of the 28th International Conference on Computational Linguistics, pages 568–579, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis (Zhou et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.coling-main.49.pdf
- Code
- 12190143/sentix
- Data
- IMDb Movie Reviews, SST, SST-2, SST-5