Kevin Li


2024

pdf
Team MLab at SemEval-2024 Task 8: Analyzing Encoder Embeddings for Detecting LLM-generated Text
Kevin Li | Kenan Hasanaliyev | Sally Zhu | George Altshuler | Alden Eberts | Eric Chen | Kate Wang | Emily Xia | Eli Browne | Ian Chen
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper explores solutions to the challenges posed by the widespread use of LLMs, particularly in the context of identifying human-written versus machine-generated text. Focusing on Subtask B of SemEval 2024 Task 8, we compare the performance of RoBERTa and DeBERTa models. Subtask B involved identifying not only human or machine text but also the specific LLM responsible for generating text, where our DeBERTa model outperformed the RoBERTa baseline by over 10% in leaderboard accuracy. The results highlight the rapidly growing capabilities of LLMs and importance of keeping up with the latest advancements. Additionally, our paper presents visualizations using PCA and t-SNE that showcase the DeBERTa model’s ability to cluster different LLM outputs effectively. These findings contribute to understanding and improving AI methods for detecting machine-generated text, allowing us to build more robust and traceable AI systems in the language ecosystem.

2023

pdf
Francis Bacon at SemEval-2023 Task 4: Ensembling BERT and GloVe for Value Identification in Arguments
Kenan Hasanaliyev | Kevin Li | Saanvi Chawla | Michael Nath | Rohan Sanda | Justin Wu | William Huang | Daniel Yang | Shane Mion | Kiran Bhat
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In this paper, we discuss our efforts on SemEval-2023 Task4, a task to classify the human value categoriesthat an argument draws on. Arguments consist of a premise, conclusion,and the premise’s stance on the conclusion. Our team experimented with GloVe embeddings and fine-tuning BERT. We found that an ensembling of BERT and GloVe with RidgeRegression worked the best.