This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Conversational humor is the key to capturing dialogue semantics and dialogue comprehension, which is usually generated in multiple modalities, such as linguistic rhetoric (textual modality), exaggerated facial expressions or movements (visual modality), and quirky intonation (acoustic modality). However, existing multimodal corpora for conversation humor are coarse-grained, and the modality is insufficient to support the conversational humor recognition task. This paper designed an annotation scheme for multimodal humor datasets, and constructed a corpus based on a Chinese sitcom for conversational humor recognition, named MUCH. The MUCH corpus consists of 34,804 utterances in total, and 7,079 of them are humorous. We employed both unimodal and multimodal methods to test our MUCH corpus. Experimental results showed that the multimodal approach could achieve 75.94% in terms of F1-score and surpassed the performance of most unimodal methods, which demonstrated that the MUCH corpus was effective for multimodal humor recognition tasks.
The goal of Emotion Cause Pair Extraction (ECPE) is to explore the causes of emotion changes and what causes a certain emotion. This paper proposes a three-step learning approach for the task of Textual Emotion-Cause Pair Extraction in Conversations in SemEval-2024 Task 3, named ECSP. We firstly perform data preprocessing operations on the original dataset to construct negative samples. Secondly, we use a pre-trained model to construct token sequence representations with contextual information to obtain emotion prediction. Thirdly, we regard the textual emotion-cause pair extraction task as a machine reading comprehension task, and fine-tune two pre-trained models, RoBERTa and SpanBERT. Our results have achieved good results in the official rankings, ranking 3rd under the strict match with the Strict F1-score of 15.18%, which further shows that our system has a robust performance.
The computational identification of human values is a novel and challenging research that holds the potential to offer valuable insights into the nature of human behavior and cognition. This paper presents the methodology adopted by the Arthur-Caplan research team for the SemEval-2023 Task 4, which entailed the detection of human values behind arguments. The proposed system integrates BERT, ERNIE2.0, RoBERTA and XLNet models with fine tuning. Experimental results show that the macro F1 score of our system achieved 0.512, which overperformed baseline methods by 9.2% on the test set.
Under the umbrella of anonymous social networks, many women have suffered from abuse, discrimination, and other sexist expressions online. However, exsiting methods based on keyword filtering and matching performed poorly on online sexism detection, which lacked the capability to identify implicit stereotypes and discrimination. Therefore, this paper proposes a System of Ensembling Fine-tuning Models (SEFM) at SemEval-2023 Task 10: Explainable Detection of Online Sexism. We firstly use four task-adaptive pre-trained language models to flag all texts. Secondly, we alleviate the data imbalance from two perspectives: over-sampling the labelled data and adjusting the loss function. Thirdly, we add indicators and feedback modules to enhance the overall performance. Our system attained macro F1 scores of 0.8538, 0.6619, and 0.4641 for Subtask A, B, and C, respectively. Our system exhibited strong performance across multiple tasks, with particularly noteworthy performance in Subtask B. Comparison experiments and ablation studies demonstrate the effectiveness of our system.
“Chinese Frame Semantic Parsing (CFSP) is a semantic parsing task based on Chinese FrameNet(CFN). This paper presents a solution for CCL2023-Eval Task 3. We first attempt various pre-trained models for different sub-tasks. Then, we explore multiple approaches to solving eachtask from the perspectives of feature engineering, model structure, and other tricks. Finally,we provide prospects for the task and propose potential alternative solutions. We conductedextensive comparative experiments to validate the effectiveness of our system. Introduction”
Humor plays an important role in our daily life, as it is an essential and fascinating element in the communication between persons. Therefore, how to recognize punchlines from the dialogue, i.e. conversational humor recognition, has attracted much interest of computational linguistics communities. However, most existing work attempted to understand the conversational humor by analyzing the contextual information of the dialogue, but neglected the character of the interlocutor, such as age, gender, occupation, and so on. For instance, the same utterance could bring out humorous from a serious person, but may be a plain expression from a naive person. To this end, this paper proposes a Character Fusion Conversational Humor Recognition model (CFCHR) to explore character information to recognize conversational humor. CFCHR utilizes a multi-task learning framework that unifies two highly pertinent tasks, i.e., character extraction and punchline identification. Based on deep neural networks, we trained both tasks jointly by sharing weight to extract the common and task-invariant features while each task could still learn its task-specific features. Experiments were conducted on Chinese sitcoms corpus, which consisted of 12,677 utterances from 22 characters. The experimental results demonstrated that CFCHR could achieve 33.08% improvements in terms of F1-score over some strong baselines, and proved the effectiveness of the character information to identify the punchlines.
Early rumor detection is a key challenging task to prevent rumors from spreading widely. Sociological research shows that social bots’ behavior in the early stage has become the main reason for rumors’ wide spread. However, current models do not explicitly distinguish genuine users from social bots, and their failure in identifying rumors timely. Therefore, this paper aims at early rumor detection by accounting for social bots’ behavior, and presents a Social Bot-Aware Graph Neural Network, named SBAG. SBAG firstly pre-trains a multi-layer perception network to capture social bot features, and then constructs multiple graph neural networks by embedding the features to model the early propagation of posts, which is further used to detect rumors. Extensive experiments on three benchmark datasets show that SBAG achieves significant improvements against the baselines and also identifies rumors within 3 hours while maintaining more than 90% accuracy.
We introduce CHIME, a cross-passage hierarchical memory network for question answering (QA) via text generation. It extends XLNet introducing an auxiliary memory module consisting of two components: the context memory collecting cross-passage evidences, and the answer memory working as a buffer continually refining the generated answers. Empirically, we show the efficacy of the proposed architecture in the multi-passage generative QA, outperforming the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing the questions of the AmazonQA review dataset. An additional qualitative analysis revealed the interpretability introduced by the memory module.
Rumours can spread quickly through social media, and malicious ones can bring about significant economical and social impact. Motivated by this, our paper focuses on the task of rumour detection; particularly, we are interested in understanding how early we can detect them. Although there are numerous studies on rumour detection, few are concerned with the timing of the detection. A successfully-detected malicious rumour can still cause significant damage if it isn’t detected in a timely manner, and so timing is crucial. To address this, we present a novel methodology for early rumour detection. Our model treats social media posts (e.g. tweets) as a data stream and integrates reinforcement learning to learn the number minimum number of posts required before we classify an event as a rumour. Experiments on Twitter and Weibo demonstrate that our model identifies rumours earlier than state-of-the-art systems while maintaining a comparable accuracy.
Attention-based neural models were employed to detect the different aspects and sentiment polarities of the same target in targeted aspect-based sentiment analysis (TABSA). However, existing methods do not specifically pre-train reasonable embeddings for targets and aspects in TABSA. This may result in targets or aspects having the same vector representations in different contexts and losing the context-dependent information. To address this problem, we propose a novel method to refine the embeddings of targets and aspects. Such pivotal embedding refinement utilizes a sparse coefficient vector to adjust the embeddings of target and aspect from the context. Hence the embeddings of targets and aspects can be refined from the highly correlative words instead of using context-independent or randomly initialized vectors. Experiment results on two benchmark datasets show that our approach yields the state-of-the-art performance in TABSA task.
This paper presents a UIR-Miner system for emotion and sentiment analysis evaluation in Twitter in SemEval 2018. Our system consists of three main modules: preprocessing module, stacking module to solve the intensity prediction of emotion and sentiment, LSTM network module to solve multi-label classification, and the hierarchical attention network module for solving emotion and sentiment classification problem. According to the metrics of SemEval 2018, our system gets the final scores of 0.636, 0.531, 0.731, 0.708, and 0.408 on 5 subtasks, respectively.
We present a system called ACE for Automatic Colloquialism and Errors detection for written Chinese. ACE is based on the combination of N-gram model and rule-base model. Although it focuses on detecting colloquial Cantonese (a dialect of Chinese) at the current stage, it can be extended to detect other dialects. We chose Cantonese becauase it has many interesting properties, such as unique grammar system and huge colloquial terms, that turn the detection task extremely challenging. We conducted experiments using real data and synthetic data. The results indicated that ACE is highly reliable and effective.
The lack of open discourse corpus for Chinese brings limitations for many natural language processing tasks. In this work, we present the first open discourse treebank for Chinese, namely, the Discourse Treebank for Chinese (DTBC). At the current stage, we annotated explicit intra-sentence discourse connectives, their corresponding arguments and senses for all 890 documents of the Chinese Treebank 5. We started by analysing the characteristics of discourse annotation for Chinese, adapted the annotation scheme of Penn Discourse Treebank 2 (PDTB2) to Chinese language while maintaining the compatibility as far as possible. We made adjustments to 3 essential aspects according to the previous study of Chinese linguistics. They are sense hierarchy, argument scope and semantics of arguments. Agreement study showed that our annotation scheme could achieve highly reliable results.