This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
SujitKumar
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Machine-generated text (MGT) detection has gained critical importance in the era of large language models, especially for maintaining trust in multilingual and cross-domain applica- tions. This paper presents Task 3 Subtask B: Adversarial Cross-Domain MGT Detection for in the COLING 2025 DAIGenC Workshop. Task 3 emphasizes the complexity of detecting AI-generated text across eight domains, eleven generative models, and four decoding strate- gies, with an added challenge of adversarial manipulation. We propose a robust detection framework transformer embeddings utilizing Domain-Adversarial Neural Networks (DANN) to address domain variability and adversarial robustness. Our model demonstrates strong performance in identifying AI-generated text under adversarial conditions while highlighting condition scope of future improvement.
Emotion detection is essential for applications like mental health monitoring and social media analysis, yet remains underexplored for Indian languages. This paper presents our system for SemEval-2025 Task 11 (Track A), focusing on multilabel emotion detection in Hindi and Marathi, two widely spoken Indian languages. We fine-tune IndicBERT v2 on the BRIGHTER dataset, achieving F1 scores of 87.37 (Hindi) and 88.32 (Marathi), outperforming baseline models. Our results highlight the effectiveness of fine-tuning a language-specific pretrained model for emotion detection, contributing to advancements in multilingual NLP research.
The generation of headlines, a crucial aspect of abstractive summarization, aims to compress an entire article into a concise, single line of text despite the effectiveness of modern encoder-decoder models for text generation and summarization tasks. The encoder-decoder model commonly faces challenges in accurately generating numerical content within headlines. This study empirically explored LLMs for numeral-aware headline generation and proposed few-shot prompting with LLMs for numeral-aware headline generations. Experiments conducted on the NumHG dataset and NumEval-2024 test set suggest that fine-tuning LLMs on NumHG dataset enhances the performance of LLMs for numeral aware headline generation. Furthermore, few-shot prompting with LLMs surpassed the performance of fine-tuned LLMs for numeral-aware headline generation.
The prevalence of deceptive and incongruent news headlines has highlighted their substantial role in the propagation of fake news, exacerbating the spread of both misinformation and disinformation. Existing studies on incongruity detection primarily concentrate on estimating the similarity between the encoded representation of headlines and the encoded representation or summary representative vector of the news body. In the process of obtaining the encoded representation of the news body, researchers often consider either sequential encoding or hierarchical encoding of the news body or to acquire a summary representative vector of the news body, they explore techniques like summarization or dual summarization methods. Nevertheless, when it comes to detecting partially incongruent news, dual summarization-based methods tend to outperform hierarchical encoding-based methods. On the other hand, for datasets focused on detecting fake news, where the hierarchical structure within a news article plays a crucial role, hierarchical encoding-based methods tend to perform better than summarization-based methods. Recognizing this contradictory performance of hierarchical encoding-based and summarizationbased methods across datasets with different characteristics, we introduced a novel approach called Multiset Dual Summarization (MDS). MDS combines the strengths of both hierarchical encoding and dual summarization methods to leverage their respective advantages. We conducted experiments on datasets with diverse characteristics, and our findings demonstrate that our proposed model outperforms established state-of-the-art baseline models.
The rise of social media has exponentially witnessed the use of clickbait posts that grab users’ attention. Although work has been done to detect clickbait posts, this is the first task focused on generating appropriate spoilers for these potential clickbaits. This paper presents our approach in this direction. We use different encoding techniques that capture the context of the post text and the target paragraph. We propose hierarchical encoding with count and document length feature-based model for spoiler type classification which uses Recurrence over Pretrained Encoding. We also propose combining multiple ranking with reciprocal rank fusion for passage spoiler retrieval and question-answering approach for phrase spoiler retrieval. For multipart spoiler retrieval, we combine the above two spoiler retrieval methods. Experimental results over the benchmark suggest that our proposed spoiler retrieval methods are able to retrieve spoilers that are semantically very close to the ground truth spoilers.
With the increasing use of influencing incongruent news headlines for spreading fake news, detecting incongruent news articles has become an important research challenge. Most of the earlier studies on incongruity detection focus on estimating the similarity between the headline and the encoding of the body or its summary. However, most of these methods fail to handle incongruent news articles created with embedded noise. Motivated by the above issue, this paper proposes a Multi-head Attention Dual Summary (MADS) based method which generates two types of summaries that capture the congruent and incongruent parts in the body separately. From various experimental setups over three publicly available datasets, it is evident that the proposed model outperforms the state-of-the-art baseline counterparts.