Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning

Xiliang Zhu, Shayna Gardiner, David Rossouw, Tere Roldán, Simon Corston-Oliver


Abstract
Automatic Speech Recognition (ASR) systems typically produce unpunctuated transcripts that have poor readability. In addition, building a punctuation restoration system is challenging for low-resource languages, especially for domain-specific applications. In this paper, we propose a Spanish punctuation restoration system designed for a real-time customer support transcription service. To address the data sparsity of Spanish transcripts in the customer support domain, we introduce two transferlearning-based strategies: 1) domain adaptation using out-of-domain Spanish text data; 2) crosslingual transfer learning leveraging in-domain English transcript data. Our experiment results show that these strategies improve the accuracy of the Spanish punctuation restoration system.
Anthology ID:
2022.deeplo-1.9
Volume:
Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing
Month:
July
Year:
2022
Address:
Hybrid
Venue:
DeepLo
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–89
Language:
URL:
https://aclanthology.org/2022.deeplo-1.9
DOI:
10.18653/v1/2022.deeplo-1.9
Bibkey:
Cite (ACL):
Xiliang Zhu, Shayna Gardiner, David Rossouw, Tere Roldán, and Simon Corston-Oliver. 2022. Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning. In Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 80–89, Hybrid. Association for Computational Linguistics.
Cite (Informal):
Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning (Zhu et al., DeepLo 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.deeplo-1.9.pdf
Video:
 https://preview.aclanthology.org/auto-file-uploads/2022.deeplo-1.9.mp4