This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Short video platforms are evolving rapidly, making the identification of inappropriate content increasingly critical.Existing approaches typically train separate and small classification models for each type of issue, which requires extensive human-labeled data and lacks cross-issue generalization.We propose a reasoning-enhanced multimodal large language model (MLLM) pretraining paradigm for unified inappropriate content detection. To address the distribution gap between short video content and the original pretraining data of MLLMs, as well as the complex issue definitions, we introduce three targeted pretraining tasks:(1) Caption, to enhance the MLLM’s perception of video details;(2) Visual Question Answering (VQA), to deepen the MLLM’s understanding of issue definitions and annotation guidelines;(3) Chain-of-Thought (CoT), to enhance the MLLM’s reasoning capability.Experimental results show that our pretraining approach significantly improves the MLLM’s performance in both zero-shot and supervised fine-tuning (SFT) settings.In addition, our pretrained model demonstrates strong generalization capabilities to emergent, previously unseen issues.
In light of the prominence of Pre-trained Language Models (PLMs) across numerous downstream tasks, shedding light on what they learn is an important endeavor. Whereas previous work focuses on assessing in-domain knowledge, we evaluate the generalization ability in biased scenarios through component combinations where it could be easy for the PLMs to learn shortcuts from the training corpus. This would lead to poor performance on the testing corpus, which is combinationally reconstructed from the training components. The results show that PLMs are able to overcome such distribution shifts for specific tasks and with sufficient data. We further find that overfitting can lead the models to depend more on biases for prediction, thus hurting the combinational generalization ability of PLMs.
Due to recent pretrained multilingual representation models, it has become feasible to exploit labeled data from one language to train a cross-lingual model that can then be applied to multiple new languages. In practice, however, we still face the problem of scarce labeled data, leading to subpar results. In this paper, we propose a novel data augmentation strategy for better cross-lingual natural language inference by enriching the data to reflect more diversity in a semantically faithful way. To this end, we propose two methods of training a generative model to induce synthesized examples, and then leverage the resulting data using an adversarial training regimen for more robustness. In a series of detailed experiments, we show that this fruitful combination leads to substantial gains in cross-lingual inference.
Sentiment analysis is an area of substantial relevance both in industry and in academia, including for instance in social studies. Although supervised learning algorithms have advanced considerably in recent years, in many settings it remains more practical to apply an unsupervised technique. The latter are oftentimes based on sentiment lexicons. However, existing sentiment lexicons reflect an abstract notion of polarity and do not do justice to the substantial differences of word polarities between different domains. In this work, we draw on a collection of domain-specific data to induce a set of 24 domain-specific sentiment lexicons. We rely on initial linear models to induce initial word intensity scores, and then train new deep models based on word vector representations to overcome the scarcity of the original seed data. Our analysis shows substantial differences between domains, which make domain-specific sentiment lexicons a promising form of lexical resource in downstream tasks, and the predicted lexicons indeed perform effectively on tasks such as review classification and cross-lingual word sentiment prediction.
Based on massive amounts of data, recent pretrained contextual representation models have made significant strides in advancing a number of different English NLP tasks. However, for other languages, relevant training data may be lacking, while state-of-the-art deep learning methods are known to be data-hungry. In this paper, we present an elegantly simple robust self-learning framework to include unlabeled non-English samples in the fine-tuning process of pretrained multilingual representation models. We leverage a multilingual model’s own predictions on unlabeled non-English data in order to obtain additional information that can be used during further fine-tuning. Compared with original multilingual models and other cross-lingual classification models, we observe significant gains in effectiveness on document and sentiment classification for a range of diverse languages.
Deep convolutional neural networks excel at sentiment polarity classification, but tend to require substantial amounts of training data, which moreover differs quite significantly between domains. In this work, we present an approach to feed generic cues into the training process of such networks, leading to better generalization abilities given limited training data. We propose to induce sentiment embeddings via supervision on extrinsic data, which are then fed into the model via a dedicated memory-based component. We observe significant gains in effectiveness on a range of different datasets in seven different languages.