2025
pdf
bib
abs
Supporting Online Discussions: Integrating AI Into the adhocracy+ Participation Platform To Enhance Deliberation
Maike Behrendt
|
Stefan Sylvius Wagner
|
Mira Warne
|
Jana Leonie Peters
|
Marc Ziegele
|
Stefan Harmeling
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)
Online spaces provide individuals with the opportunity to engage in discussions on important topics and make collective decisions, regardless of their geographic location or time zone. However, without adequate support and careful design, such discussions often suffer from a lack of structure and civility in the exchange of opinions. Artificial intelligence (AI) offers a promising avenue for helping both participants and organizers in managing large-scale online participation processes. This paper introduces an extension of adhocracy+, a large-scale open-source participation platform. Our extension features two AI-supported debate modules designed to improve discussion quality and foster participant interaction.In a large-scale user study we examined the effects and usability of both modules. We report our findings in this paper. The extended platform is available at https://github.com/mabehrendt/discuss2.0.
2024
pdf
bib
abs
AQuA – Combining Experts’ and Non-Experts’ Views To Assess Deliberation Quality in Online Discussions Using LLMs
Maike Behrendt
|
Stefan Sylvius Wagner
|
Marc Ziegele
|
Lena Wilms
|
Anke Stoll
|
Dominique Heinbach
|
Stefan Harmeling
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024
Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts’ annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts’ vs. non-experts’ annotations confirms theoretical findings in the social science literature.
2023
pdf
bib
Automatic Dictionary Generation: Could Brothers Grimm Create a Dictionary with BERT?
Hendryk Weiland
|
Maike Behrendt
|
Stefan Harmeling
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)
2021
pdf
bib
ArgueBERT: How To Improve BERT Embeddings for Measuring the Similarity of Arguments
Maike Behrendt
|
Stefan Harmeling
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)
pdf
bib
abs
How Will I Argue? A Dataset for Evaluating Recommender Systems for Argumentations
Markus Brenneis
|
Maike Behrendt
|
Stefan Harmeling
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Exchanging arguments is an important part in communication, but we are often flooded with lots of arguments for different positions or are captured in filter bubbles. Tools which can present strong arguments relevant to oneself could help to reduce those problems. To be able to evaluate algorithms which can predict how convincing an argument is, we have collected a dataset with more than 900 arguments and personal attitudes of 600 individuals, which we present in this paper. Based on this data, we suggest three recommender tasks, for which we provide two baseline results from a simple majority classifier and a more complex nearest-neighbor algorithm. Our results suggest that better algorithms can still be developed, and we invite the community to improve on our results.