Zero-shot Learning for Multilingual Discourse Relation Classification
Eleni Metheniti, Philippe Muller, Chloé Braud, Margarita Hernández Casas
Abstract
Classifying discourse relations is known as a hard task, relying on complex indices. On the other hand, discourse-annotated data is scarce, especially for languages other than English: many corpora, of limited size, exist for several languages but the domain is split between different theoretical frameworks that have a huge impact on the nature of the textual spans to be linked, and the label set used. Moreover, each annotation project implements modifications compared to the theoretical background and other projects. These discrepancies hinder the development of systems taking advantage of all the available data to tackle data sparsity and work on transfer between languages is very limited, almost nonexistent between frameworks, while it could improve our understanding of some theoretical aspects and enhance many applications. In this paper, we propose the first experiments on zero-shot learning for discourse relation classification and investigate several paths in the way source data can be combined, either based on languages, frameworks, or similarity measures. We demonstrate how difficult transfer is for the task at hand, and that the most impactful factor is label set divergence, where the notion of underlying framework possibly conceals crucial disagreements.- Anthology ID:
- 2024.lrec-main.1553
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 17858–17876
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1553
- DOI:
- Cite (ACL):
- Eleni Metheniti, Philippe Muller, Chloé Braud, and Margarita Hernández Casas. 2024. Zero-shot Learning for Multilingual Discourse Relation Classification. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17858–17876, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Zero-shot Learning for Multilingual Discourse Relation Classification (Metheniti et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2024.lrec-main.1553.pdf