M3Seg: A Maximum-Minimum Mutual Information Paradigm for Unsupervised Topic Segmentation in ASR Transcripts

Ke Wang; Xiutian Zhao; Yanghui Li; Wei Peng

doi:10.18653/v1/2023.emnlp-main.492

M³Seg: A Maximum-Minimum Mutual Information Paradigm for Unsupervised Topic Segmentation in ASR Transcripts

Ke Wang, Xiutian Zhao, Yanghui Li, Wei Peng

Abstract

Topic segmentation aims to detect topic boundaries and split automatic speech recognition transcriptions (e.g., meeting transcripts) into segments that are bounded by thematic meanings. In this work, we propose M³Seg, a novel Maximum-Minimum Mutual information paradigm for linear topic segmentation without using any parallel data. Specifically, by employing sentence representations provided by pre-trained language models, M³Seg first learns a region-based segment encoder based on the maximization of mutual information between the global segment representation and the local contextual sentence representation. Secondly, an edge-based boundary detection module aims to segment the whole by topics based on minimizing the mutual information between different segments. Experiment results on two public datasets demonstrate the effectiveness of M³Seg, which outperform the state-of-the-art methods by a significant (18%–37% improvement) margin.

Anthology ID:: 2023.emnlp-main.492
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7928–7934
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.492/
DOI:: 10.18653/v1/2023.emnlp-main.492
Bibkey:
Cite (ACL):: Ke Wang, Xiutian Zhao, Yanghui Li, and Wei Peng. 2023. M3Seg: A Maximum-Minimum Mutual Information Paradigm for Unsupervised Topic Segmentation in ASR Transcripts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7928–7934, Singapore. Association for Computational Linguistics.
Cite (Informal):: M3Seg: A Maximum-Minimum Mutual Information Paradigm for Unsupervised Topic Segmentation in ASR Transcripts (Wang et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.492.pdf
Video:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.492.mp4

PDF Cite Search Video Fix data