Cheng Wang


2021

pdf
Learning Slice-Aware Representations with Mixture of Attentions
Cheng Wang | Sungjin Lee | Sunghyun Park | Han Li | Young-Bum Kim | Ruhi Sarikaya
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2018

pdf
LRMM: Learning to Recommend with Missing Modalities
Cheng Wang | Mathias Niepert | Hui Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Multimodal learning has shown promising performance in content-based recommendation due to the auxiliary user and item information of multiple modalities such as text and images. However, the problem of incomplete and missing modality is rarely explored and most existing methods fail in learning a recommendation model with missing or corrupted modalities. In this paper, we propose LRMM, a novel framework that mitigates not only the problem of missing modalities but also more generally the cold-start problem of recommender systems. We propose modality dropout (m-drop) and a multimodal sequential autoencoder (m-auto) to learn multimodal representations for complementing and imputing missing modalities. Extensive experiments on real-world Amazon data show that LRMM achieves state-of-the-art performance on rating prediction tasks. More importantly, LRMM is more robust to previous methods in alleviating data-sparsity and the cold-start problem.

2016

pdf
Punctuation Prediction for Unsegmented Transcript Based on Word Vector
Xiaoyin Che | Cheng Wang | Haojin Yang | Christoph Meinel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we propose an approach to predict punctuation marks for unsegmented speech transcript. The approach is purely lexical, with pre-trained Word Vectors as the only input. A training model of Deep Neural Network (DNN) or Convolutional Neural Network (CNN) is applied to classify whether a punctuation mark should be inserted after the third word of a 5-words sequence and which kind of punctuation mark the inserted one should be. TED talks within IWSLT dataset are used in both training and evaluation phases. The proposed approach shows its effectiveness by achieving better result than the state-of-the-art lexical solution which works with same type of data, especially when predicting puncuation position only.

2015

pdf
Co-training for Semi-supervised Sentiment Classification Based on Dual-view Bags-of-words Representation
Rui Xia | Cheng Wang | Xin-Yu Dai | Tao Li
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)