Wei Bai

2020

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of assessing the funniness of edited news headlines, which is a part of the SemEval 2020 competition, we preprocess datasets by replacing abbreviation, stemming words, then merge three models including Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), and Bidirectional Encoder Representation from Transformer (BERT) by taking the average to perform the best. Our team Ferryman wins the 9th place in Sub-task 1 of Task 7 - Regression.

pdf abs
Will_go at SemEval-2020 Task 9: An Accurate Approach for Sentiment Analysis on Hindi-English Tweets Based on Bert and Pesudo Label Strategy
Wei Bao | Weilong Chen | Wei Bai | Yan Zhuang | Mingyuan Cheng | Xiangyu Ma
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Mixing languages are widely used in social media, especially in multilingual societies like India. Detecting the emotions contained in these languages, which is of great significance to the development of society and political trends. In this paper, we propose an ensemble of pesudo-label based Bert model and TFIDF based SGDClassifier model to identify the sentiments of Hindi-English (Hi-En) code-mixed data. The ensemble model combines the strengths of rich semantic information from the Bert model and word frequency information from the probabilistic ngram model to predict the sentiment of a given code-mixed tweet. Finally our team got an average F1 score of 0.731 on the final leaderboard,and our codalab username is will_go.

Co-authors

Wei Bao 1

Yan Zhuang 1

Mingyuan Cheng 1

Xiangyu Ma 1

Venues

semeval2