Hoa Do


2024

This paper describes our approach and results for the SemEval 2024 task of identifying the token index in a mixed text where a switch from human authorship to machine-generated text occurs. We explore two BiLSTMs, one over sentence feature vectors to predict the index of the sentence containing such a change and another over character embeddings of the text. As sentence features, we compute token count, mean token length, standard deviation of token length, counts for punctuation and space characters, various readability scores, word frequency class and word part-of-speech class counts for each sentence. class counts. The evaluation is performed on mean absolute error (MAE) between predicted and actual boundary word index. While our competition results were notably below the baseline, there may still be useful aspects to our approach.