Timothy Havens

2023

pdf abs
GPTs Don’t Keep Secrets: Searching for Backdoor Watermark Triggers in Autoregressive Language Models
Evan Lucas | Timothy Havens
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)

This work analyzes backdoor watermarks in an autoregressive transformer fine-tuned to perform a generative sequence-to-sequence task, specifically summarization. We propose and demonstrate an attack to identify trigger words or phrases by analyzing open ended generations from autoregressive models that have backdoor watermarks inserted. It is shown in our work that triggers based on random common words are easier to identify than those based on single, rare tokens. The attack proposed is easy to implement and only requires access to the model weights. Code used to create the backdoor watermarked models and analyze their outputs is shared at [github link to be inserted for camera ready version].

pdf abs
Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus
Kalvin Hartwig | Evan Lucas | Timothy Havens
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)

The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper].

Co-authors

Evan Lucas 2
Kalvin Hartwig 1

Timothy Havens

2023

Co-authors

Venues