Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite

Tim Fischer, Chris Biemann


Abstract
This paper explores an AI-assisted approach to sequential sentence annotation designed to enhance qualitative data analysis (QDA) workflows within the open-source Discourse Analysis Tool Suite (DATS) developed at our university.We introduce a three-phase Annotation Assistant that leverages the capabilities of large language models (LLMs) to assist researchers during annotation.Based on the number of annotations, the assistant employs zero-shot prompting, few-shot prompting, or fine-tuned models to provide the best suggestions.To evaluate this approach, we construct a benchmark with five diverse datasets.We assess the performance of three prominent open-source LLMs — Llama 3.1, Gemma 2, and Mistral NeMo — and a sequence tagging model based on SentenceTransformers.Our findings demonstrate the effectiveness of our approach, with performance improving as the number of annotated examples increases. Consequently, we implemented the Annotation Assistant within DATS and report the implementation details.With this, we hope to contribute to a novel AI-assisted workflow and further democratize access to AI for qualitative data analysis.
Anthology ID:
2025.naacl-demo.16
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Nouha Dziri, Sean (Xiang) Ren, Shizhe Diao
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
151–162
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-demo.16/
DOI:
Bibkey:
Cite (ACL):
Tim Fischer and Chris Biemann. 2025. Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), pages 151–162, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite (Fischer & Biemann, NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-demo.16.pdf