Xiliang Zhu


2022

pdf
Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning
Xiliang Zhu | Shayna Gardiner | David Rossouw | Tere Roldán | Simon Corston-Oliver
Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing

Automatic Speech Recognition (ASR) systems typically produce unpunctuated transcripts that have poor readability. In addition, building a punctuation restoration system is challenging for low-resource languages, especially for domain-specific applications. In this paper, we propose a Spanish punctuation restoration system designed for a real-time customer support transcription service. To address the data sparsity of Spanish transcripts in the customer support domain, we introduce two transferlearning-based strategies: 1) domain adaptation using out-of-domain Spanish text data; 2) crosslingual transfer learning leveraging in-domain English transcript data. Our experiment results show that these strategies improve the accuracy of the Spanish punctuation restoration system.