Who Wrote When? Author Diarization in Social Media Discussions
Benedikt Boenninghoff, Henry Hosseini, Robert M. Nickel, Dorothea Kolossa
Abstract
We are proposing a novel framework for author diarization, i.e. attributing comments in online discussions to individual authors. We consider an innovative approach that merges pre-trained neural representations of writing style with author-conditional encoder-decoder diarization, enhanced by a Conditional Random Field with Viterbi decoding for alignment refinement. Additionally, we introduce two new large-scale German language datasets, one for authorship verification and the other for author diarization. We evaluate the performance of our diarization framework on these datasets, offering insights into the strengths and limitations of this approach.- Anthology ID:
- 2024.findings-emnlp.922
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15721–15734
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.922/
- DOI:
- 10.18653/v1/2024.findings-emnlp.922
- Cite (ACL):
- Benedikt Boenninghoff, Henry Hosseini, Robert M. Nickel, and Dorothea Kolossa. 2024. Who Wrote When? Author Diarization in Social Media Discussions. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 15721–15734, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Who Wrote When? Author Diarization in Social Media Discussions (Boenninghoff et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.922.pdf