Generalizing across Languages and Domains for Discourse Relation Classification

Peter Bourgonje, Vera Demberg


Abstract
The availability of corpora annotated for discourse relations is limited and discourse relation classification performance varies greatly depending on both language and domain. This is a problem for downstream applications that are intended for a language (i.e., not English) or a domain (i.e., not financial news) with comparatively low coverage for discourse annotations. In this paper, we experiment with a state-of-the-art model for discourse relation classification, originally developed for English, extend it to a multi-lingual setting (testing on Italian, Portuguese and Turkish), and employ a simple, yet effective method to mark out-of-domain training instances. By doing so, we aim to contribute to better generalization and more robust discourse relation classification performance across both language and domain.
Anthology ID:
2024.sigdial-1.47
Volume:
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
September
Year:
2024
Address:
Kyoto, Japan
Editors:
Tatsuya Kawahara, Vera Demberg, Stefan Ultes, Koji Inoue, Shikib Mehri, David Howcroft, Kazunori Komatani
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
554–565
Language:
URL:
https://aclanthology.org/2024.sigdial-1.47
DOI:
10.18653/v1/2024.sigdial-1.47
Bibkey:
Cite (ACL):
Peter Bourgonje and Vera Demberg. 2024. Generalizing across Languages and Domains for Discourse Relation Classification. In Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 554–565, Kyoto, Japan. Association for Computational Linguistics.
Cite (Informal):
Generalizing across Languages and Domains for Discourse Relation Classification (Bourgonje & Demberg, SIGDIAL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2024.sigdial-1.47.pdf