Can LLMs Be Efficient Predictors of Conversational Derailment?

Kaustubh Olpadkar; Vikram Sunil Bajaj; Leslie Barrett

doi:10.18653/v1/2025.findings-emnlp.816

Can LLMs Be Efficient Predictors of Conversational Derailment?

Kaustubh Olpadkar, Vikram Sunil Bajaj, Leslie Barrett

Abstract

Conversational derailment — when online discussions stray from their intended topics due to toxic or inappropriate remarks — is a common issue on online platforms. These derailments can have negative impacts on users and the online community. While previous work has focused on post hoc identification of toxic content, recent efforts emphasize proactive prediction of derailments before they occur, enabling early moderation. However, forecasting derailment is difficult due to the context-dependent emergence of toxicity and the need for timely alerts. We prompt pre-trained large language models (LLMs) to predict conversational derailment without task-specific fine-tuning. We compare a range of prompting strategies, including chain-of-thought reasoning (CoT) and few-shot exemplars, across small and large scale models, and evaluate their performance and inference-cost trade-offs on derailment benchmarks. Our experiments show that the best prompting configuration attains state-of-the-art performance, and forecasts derailments earlier than existing approaches. These results demonstrate that LLMs, even without fine-tuning, can serve as an effective tool for proactive conversational moderation.

Anthology ID:: 2025.findings-emnlp.816
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15104–15112
Language:
URL:: https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.816/
DOI:: 10.18653/v1/2025.findings-emnlp.816
Bibkey:
Cite (ACL):: Kaustubh Olpadkar, Vikram Sunil Bajaj, and Leslie Barrett. 2025. Can LLMs Be Efficient Predictors of Conversational Derailment?. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15104–15112, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Can LLMs Be Efficient Predictors of Conversational Derailment? (Olpadkar et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.816.pdf
Checklist:: 2025.findings-emnlp.816.checklist.pdf

PDF Cite Search Checklist Fix data