2025
pdf
bib
abs
Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification
Kenneth Alperin
|
Rohan Leekha
|
Adaku Uchendu
|
Trang Nguyen
|
Srilakshmi Medarametla
|
Carlos Levya Capote
|
Seth Aycock
|
Charlie Dagli
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
The increasing use of Artificial Intelligence(AI) technologies, such as Large LanguageModels (LLMs) has led to nontrivial improvementsin various tasks, including accurate authorshipidentification of documents. However,while LLMs improve such defense techniques,they also simultaneously provide a vehicle formalicious actors to launch new attack vectors.To combat this security risk, we evaluate theadversarial robustness of authorship models(specifically an authorship verification model)to potent LLM-based attacks. These attacksinclude untargeted methods - authorship obfuscationand targeted methods - authorshipimpersonation. For both attacks, the objectiveis to mask or mimic the writing style of an authorwhile preserving the original texts’ semantics,respectively. Thus, we perturb an accurateauthorship verification model, and achievemaximum attack success rates of 92% and 78%for both obfuscation and impersonation attacks,respectively.
2023
pdf
bib
abs
Improving Long-Text Authorship Verification via Model Selection and Data Tuning
Trang Nguyen
|
Charlie Dagli
|
Kenneth Alperin
|
Courtland Vandam
|
Elliot Singer
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Authorship verification is used to link texts written by the same author without needing a model per author, making it useful to deanonymizing users spreading text with malicious intent. In this work, we evaluated our Cross-Encoder system with four Transformers using differently tuned variants of fanfiction data and found that our BigBird pipeline outperformed Longformer, RoBERTa, and ELECTRA and performed competitively against the official top ranked system from the PAN evaluation. We also examined the effect of authors and fandoms not seen in training on model performance. Through this, we found fandom has the greatest influence on true trials, and that a balanced training dataset in terms of class and fandom performed the most consistently.
2017
pdf
bib
abs
Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth
Jennifer Williams
|
Charlie Dagli
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID). Our method combines geo-location, original Twitter LID labels, and Amazon Mechanical Turk to resolve missing and unreliable labels. We are the first to compare LID classification performance using the MIRA algorithm and langid.py. We show classifier performance on different versions of our dataset with high accuracy using only Twitter data, without ground truth, and very few training examples. We also show how Platt Scaling can be use to calibrate MIRA classifier output values into a probability distribution over candidate classes, making the output more intuitive. Our method allows for fine-grained distinctions between similar languages and dialects and allows us to rediscover the language composition of our Twitter dataset.