Joseph Larson


2024

pdf
Team jelarson at SemEval 2024 Task 8: Predicting Boundary Line Between Human and Machine Generated Text
Joseph Larson | Francis Tyers
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we handle the task of building a system that, given a document written first by a human and then finished by an LLM, the system must determine the transition word i.e. where the machine begins to write. We built a system by examining the data for textual anomalies and combining a method of heuristic approaches with a linear regression model based on the text length of each document.

pdf
SemEval Task 8: A Comparison of Traditional and Neural Models for Detecting Machine Authored Text
Srikar Kashyap Pulipaka | Shrirang Mhalgi | Joseph Larson | Sandra Kübler
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Since Large Language Models have reached a stage where it is becoming more and more difficult to distinguish between human and machine written text, there is an increasing need for automated systems to distinguish between them. As part of SemEval Task 8, Subtask A: Binary Human-Written vs. Machine-Generated Text Classification, we explore a variety of machine learning classifiers, from traditional statistical methods, such as Naïve Bayes and Decision Trees, to fine-tuned transformer models, suchas RoBERTa and ALBERT. Our findings show that using a fine-tuned RoBERTa model with optimizedhyperparameters yields the best accuracy. However, the improvement does not translate to the test set because of the differences in distribution in the development and test sets.