This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
RakeshVerma
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
The rapid development of large language models (LLMs) has significantly improved the generation of fluent and convincing text, raising concerns about their potential misuse on social media platforms. We present a comprehensive methodology for creating nine Twitter datasets to examine the generative capabilities of four prominent LLMs: Llama 3, Mistral, Qwen2, and GPT4o. These datasets encompass four censored and five uncensored model configurations, including 7B and 8B parameter base-instruction models of the three open-source LLMs. Additionally, we perform a data quality analysis to assess the characteristics of textual outputs from human, “censored,” and “uncensored models,” employing semantic meaning, lexical richness, structural patterns, content characteristics, and detector performance metrics to identify differences and similarities. Our evaluation demonstrates that “uncensored” models significantly undermine the effectiveness of automated detection methods. This study addresses a critical gap by exploring smaller open-source models and the ramifications of “uncensoring,” providing valuable insights into how domain adaptation and content moderation strategies influence both the detectability and structural characteristics of machine-generated text.
This paper addresses hate speech detection in Turkish and Arabic tweets, contributing to the HSD-2Lang Shared Task. We propose a specialized pooling strategy within a soft-voting ensemble framework to improve classification in Turkish and Arabic language models. Our approach also includes expanding the training sets through cross-lingual translation, introducing a broader spectrum of hate speech examples. Our method attains F1-Macro scores of 0.6964 for Turkish (Subtask A) and 0.7123 for Arabic (Subtask B). While achieving these results, we also consider the computational overhead, striking a balance between the effectiveness of our unique pooling strategy, data augmentation, and soft-voting ensemble. This approach advances the practical application of language models in low-resource languages for hate speech detection.
Despite significant strides in training expansive transformer models, their deployment for niche tasks remains intricate. This paper delves into deception detection, assessing domain adaptation methodologies from a cross-domain lens using transformer Large Language Models (LLMs). We roll out a new corpus with roughly 100,000 honest and misleading statements in seven domains, designed to serve as a benchmark for multidomain deception detection. As a primary contribution, we present a novel parameter-efficient finetuning adapter, PreXIA, which was proposed and implemented as part of this work. The design is model-, domain- and task-agnostic, with broad applications that are not limited by the confines of deception or classification tasks. We comprehensively analyze and rigorously evaluate LLM tuning methods and our original design using the new benchmark, highlighting their strengths, pointing out weaknesses, and suggesting potential areas for improvement. The proposed adapter consistently outperforms all competition on the DIFrauD benchmark used in this study. To the best of our knowledge, it improves on the state-of-the-art in its class for the deception task. In addition, the evaluation process leads to unexpected findings that, at the very least, cast doubt on the conclusions made in some of the recently published research regarding reasoning ability’s unequivocal dominance over representations quality with respect to the relative contribution of each one to a model’s performance and predictions.
This paper outlines a methodology aimed at combating disinformation in Arabic social media, a strategy that secured a first-place finish in tasks 2A and 2B at the ArAIEval shared task during the ArabicNLP 2023 conference. Our team, DetectiveRedasers, developed a hyperparameter-optimized pipeline centered around singular BERT-based models for the Arabic language, enhanced by a soft-voting ensemble strategy. Subsequent evaluation on the test dataset reveals that ensembles, although generally resilient, do not always outperform individual models. The primary contributions of this paper are its multifaceted strategy, which led to winning solutions for both binary (2A) and multiclass (2B) disinformation classification tasks.
To enhance persuasion detection, we investigate the use of multilingual systems on Arabic data by conducting a total of 22 experiments using baselines, multilingual, and monolingual language transformers. Our aim is to provide a comprehensive evaluation of the various systems employed throughout this task, with the ultimate goal of comparing their performance and identifying the most effective approach. Our empirical analysis shows that *ReDASPersuasion* system performs best when combined with multilingual “XLM-RoBERTa” and monolingual pre-trained transformers on Arabic dialects like “CAMeLBERT-DA SA” depending on the NLP classification task.
This paper describes a multilingual persuasion detection system that incorporates persuasion technique attributes for a multi-label classification task. The proposed method has two advantages. First, it combines persuasion features with a sequence classification transformer to classify persuasion techniques. Second, it is a language agnostic approach that supports a total of 100 languages, guaranteed by the multilingual transformer module and the Google translator interface. We found that our persuasion system outperformed the SemEval baseline in all languages except zero shot prediction languages, which did not constitute the main focus of our research. With the highest F1-Micro score of 0.45, Italian achieved the eighth position on the leaderboard.
This article describes research on claim verification carried out using a multiple GAN-based model. The proposed model consists of three pairs of generators and discriminators. The generator and discriminator pairs are responsible for generating synthetic data for supported and refuted claims and claim labels. A theoretical discussion about the proposed model is provided to validate the equilibrium state of the model. The proposed model is applied to the FEVER dataset, and a pre-trained language model is used for the input text data. The synthetically generated data helps to gain information that improves classification performance over state of the art baselines. The respective F1 scores after applying the proposed method on FEVER 1.0 and FEVER 2.0 datasets are 0.65+-0.018 and 0.65+-0.051.
Collocation and idiom extraction are well-known challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms.