SubmissionNumber#=%=#107
FinalPaperTitle#=%=#TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text
ShortPaperTitle#=%=#
NumberOfPages#=%=#6
CopyrightSigned#=%=#Xiaoyan Qu
JobTitle#==#
Organization#==#
Abstract#==#With the increasing prevalence of text gener- ated by large language models (LLMs), there is a growing concern about distinguishing be- tween LLM-generated and human-written texts in order to prevent the misuse of LLMs, such as the dissemination of misleading information and academic dishonesty. Previous research has primarily focused on classifying text as ei- ther entirely human-written or LLM-generated, neglecting the detection of mixed texts that con- tain both types of content. This paper explores LLMs' ability to identify boundaries in human- written and machine-generated mixed texts. We approach this task by transforming it into a to- ken classification problem and regard the label turning point as the boundary. Notably, our ensemble model of LLMs achieved first place in the 'Human-Machine Mixed Text Detection' sub-task of the SemEval'24 Competition Task 8. Additionally, we investigate factors that in- fluence the capability of LLMs in detecting boundaries within mixed texts, including the incorporation of extra layers on top of LLMs, combination of segmentation loss, and the im- pact of pretraining. Our findings aim to provide valuable insights for future research in this area.
Author{1}{Firstname}#=%=#Xiaoyan
Author{1}{Lastname}#=%=#Qu
Author{1}{Username}#=%=#ququxy
Author{1}{Email}#=%=#xiaoyan11.qu@samsung.com
Author{1}{Affiliation}#=%=#Samsung R&D Institute China-Beijing
Author{2}{Firstname}#=%=#Xiangfeng
Author{2}{Lastname}#=%=#Meng
Author{2}{Username}#=%=#ericmxf
Author{2}{Email}#=%=#xf.meng@samsung.com
Author{2}{Affiliation}#=%=#Samsung R&D Institute China-Beijing

==========
èéáğö