Chunpu Xu


Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu | Hanzhuo Tan | Jing Li | Piji Li
Findings of the Association for Computational Linguistics: EMNLP 2022

The multimedia communications with texts and images are popular on social media. However, limited studies concern how images are structured with texts to form coherent meanings in human cognition. To fill in the gap, we present a novel concept of cross-modality discourse, reflecting how human readers couple image and text understandings. Text descriptions are first derived from images (named as subtitles) in the multimedia contexts. Five labels – entity-level insertion, projection and concretization and scene-level restatement and extension — are further employed to shape the structure of subtitles and texts and present their joint meanings. As a pilot study, we also build the very first dataset containing over 16K multimedia tweets with manually annotated discourse labels. The experimental results show that trendy multimedia encoders based on multi-head attention (with captions) are unable to well understand cross-modality discourse and additionally modeling texts at the output layer helps yield the-state-of-the-art results.

Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu | Jing Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Social media is daily creating massive multimedia content with paired image and text, presenting the pressing need to automate the vision and language understanding for various multimodal classification tasks. Compared to the commonly researched visual-lingual data, social media posts tend to exhibit more implicit image-text relations. To better glue the cross-modal semantics therein, we capture hinting features from user comments, which are retrieved via jointly leveraging visual and lingual similarity. Afterwards, the classification tasks are explored via self-training in a teacher-student framework, motivated by the usually limited labeled data scales in existing benchmarks. Substantial experiments are conducted on four multimodal social media benchmarks for image-text relation classification, sarcasm detection, sentiment classification, and hate speech detection. The results show that our method further advances the performance of previous state-of-the-art models, which do not employ comment modeling or self-training.


#HowYouTagTweets: Learning User Hashtagging Preferences via Personalized Topic Attention
Yuji Zhang | Yubo Zhang | Chunpu Xu | Jing Li | Ziyan Jiang | Baolin Peng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Millions of hashtags are created on social media every day to cross-refer messages concerning similar topics. To help people find the topics they want to discuss, this paper characterizes a user’s hashtagging preferences via predicting how likely they will post with a hashtag. It is hypothesized that one’s interests in a hashtag are related with what they said before (user history) and the existing posts present the hashtag (hashtag contexts). These factors are married in the deep semantic space built with a pre-trained BERT and a neural topic model via multitask learning. In this way, user interests learned from the past can be customized to match future hashtags, which is beyond the capability of existing methods assuming unchanged hashtag semantics. Furthermore, we propose a novel personalized topic attention to capture salient contents to personalize hashtag contexts. Experiments on a large-scale Twitter dataset show that our model significantly outperforms the state-of-the-art recommendation approach without exploiting latent topics.


Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning
Chunpu Xu | Yu Li | Chengming Li | Xiang Ao | Min Yang | Jinwen Tian
Proceedings of the 28th International Conference on Computational Linguistics

Image paragraph captioning (IPC) aims to generate a fine-grained paragraph to describe the visual content of an image. Significant progress has been made by deep neural networks, in which the attention mechanism plays an essential role. However, conventional attention mechanisms tend to ignore the past alignment information, which often results in problems of repetitive captioning and incomplete captioning. In this paper, we propose an Interactive key-value Memory- augmented Attention model for image Paragraph captioning (IMAP) to keep track of the attention history (salient objects coverage information) along with the update-chain of the decoder state and therefore avoid generating repetitive or incomplete image descriptions. In addition, we employ an adaptive attention mechanism to realize adaptive alignment from image regions to caption words, where an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. Extensive experiments on a benchmark dataset (i.e., Stanford) demonstrate the effectiveness of our IMAP model.