Abstract
Cross-Lingual Summarization (XLS) aims to summarize a document in the source language into a condensed version in the target language, effectively removing language barriers for non-native readers. Previous approaches, however, have the same limitation that only a single reference (gold summary) is exploited during model training, making the base model exposed to an underrepresented hypothesis space since the actual number of possible hypotheses is exponentially large. To alleviate this problem, we present a study adopting pseudo-labels in regularizing standard cross-lingual summarization training. We investigate several components leading to the gains in regularization training with verified experiments involving 8 diverse languages from different families. Conclusively, we show that pseudo-labeling is a simple and effective approach that significantly improves over standard gold reference training in XLS.- Anthology ID:
- 2024.findings-naacl.289
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2024
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4644–4677
- Language:
- URL:
- https://aclanthology.org/2024.findings-naacl.289
- DOI:
- Cite (ACL):
- Thang Le. 2024. Cross-Lingual Summarization with Pseudo-Label Regularization. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4644–4677, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Cross-Lingual Summarization with Pseudo-Label Regularization (Le, Findings 2024)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2024.findings-naacl.289.pdf