Yuji Zhang


2023

pdf
VIBE: Topic-Driven Temporal Adaptation for Twitter Classification
Yuji Zhang | Jing Li | Wenjie Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Language features are evolving in real-world social media, resulting in the deteriorating performance of text classification in dynamics. To address this challenge, we study temporal adaptation, where models trained on past data are tested in the future. Most prior work focused on continued pretraining or knowledge updating, which may compromise their performance on noisy social media data. To tackle this issue, we reflect feature change via modeling latent topic evolution and propose a novel model, VIBE: Variational Information Bottleneck for Evolutions. Concretely, we first employ two Information Bottleneck (IB) regularizers to distinguish past and future topics. Then, the distinguished topics work as adaptive features via multi-task training with timestamp and class label prediction. In adaptive learning, VIBE utilizes retrieved unlabeled data from online streams created posterior to training data time. Substantial Twitter experiments on three classification tasks show that our model, with only 3% of data, significantly outperforms previous state-of-the-art continued-pretraining methods.

2021

pdf
Engage the Public: Poll Question Generation for Social Media Posts
Zexin Lu | Keyang Ding | Yuji Zhang | Jing Li | Baolin Peng | Lemao Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper presents a novel task to generate poll questions for social media posts. It offers an easy way to hear the voice from the public and learn from their feelings to important social topics. While most related work tackles formal languages (e.g., exam papers), we generate poll questions for short and colloquial social media messages exhibiting severe data sparsity. To deal with that, we propose to encode user comments and discover latent topics therein as contexts. They are then incorporated into a sequence-to-sequence (S2S) architecture for question generation and its extension with dual decoders to additionally yield poll choices (answers). For experiments, we collect a large-scale Chinese dataset from Sina Weibo containing over 20K polls. The results show that our model outperforms the popular S2S models without exploiting topics from comments and the dual decoder design can further benefit the prediction of both questions and answers. Human evaluations further exhibit our superiority in yielding high-quality polls helpful to draw user engagements.

pdf
#HowYouTagTweets: Learning User Hashtagging Preferences via Personalized Topic Attention
Yuji Zhang | Yubo Zhang | Chunpu Xu | Jing Li | Ziyan Jiang | Baolin Peng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Millions of hashtags are created on social media every day to cross-refer messages concerning similar topics. To help people find the topics they want to discuss, this paper characterizes a user’s hashtagging preferences via predicting how likely they will post with a hashtag. It is hypothesized that one’s interests in a hashtag are related with what they said before (user history) and the existing posts present the hashtag (hashtag contexts). These factors are married in the deep semantic space built with a pre-trained BERT and a neural topic model via multitask learning. In this way, user interests learned from the past can be customized to match future hashtags, which is beyond the capability of existing methods assuming unchanged hashtag semantics. Furthermore, we propose a novel personalized topic attention to capture salient contents to personalize hashtag contexts. Experiments on a large-scale Twitter dataset show that our model significantly outperforms the state-of-the-art recommendation approach without exploiting latent topics.

2020

pdf
Hashtags, Emotions, and Comments: A Large-Scale Dataset to Understand Fine-Grained Social Emotions to Online Topics
Keyang Ding | Jing Li | Yuji Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

This paper studies social emotions to online discussion topics. While most prior work focus on emotions from writers, we investigate readers’ responses and explore the public feelings to an online topic. A large-scale dataset is collected from Chinese microblog Sina Weibo with over 13 thousand trending topics, emotion votes in 24 fine-grained types from massive participants, and user comments to allow context understanding. In experiments, we examine baseline performance to predict a topic’s possible social emotions in a multilabel classification setting. The results show that a seq2seq model with user comment modeling performs the best, even surpassing human prediction. More analyses shed light on the effects of emotion types, topic description lengths, contexts from user comments, and the limited capacity of the existing models.