Tere Roldán
2022
Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning
Xiliang Zhu
|
Shayna Gardiner
|
David Rossouw
|
Tere Roldán
|
Simon Corston-Oliver
Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing
Automatic Speech Recognition (ASR) systems typically produce unpunctuated transcripts that have poor readability. In addition, building a punctuation restoration system is challenging for low-resource languages, especially for domain-specific applications. In this paper, we propose a Spanish punctuation restoration system designed for a real-time customer support transcription service. To address the data sparsity of Spanish transcripts in the customer support domain, we introduce two transferlearning-based strategies: 1) domain adaptation using out-of-domain Spanish text data; 2) crosslingual transfer learning leveraging in-domain English transcript data. Our experiment results show that these strategies improve the accuracy of the Spanish punctuation restoration system.
Search