Context-Aware Lexical Stress Prediction and Phonemization for Ukrainian TTS Systems

Anastasiia Senyk, Mykhailo Lukianchuk, Valentyna Robeiko, Yurii Paniv


Abstract
Text preprocessing is a fundamental component of high-quality speech synthesis. This work presents a novel rule-based phonemizer combined with a sentence-level lexical stress prediction model to improve phonetic accuracy and prosody prediction in the text-to-speech pipelines. We also introduce a new benchmark dataset with annotated stress patterns designed for evaluating lexical stress prediction systems at the sentence level.Experimental results demonstrate that the proposed phonemizer achieves a 1.23% word error rate on a manually constructed pronunciation dataset, while the lexical stress prediction pipeline shows results close to dictionary-based methods, outperforming existing neural network solutions.
Anthology ID:
2025.unlp-1.11
Volume:
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria (online)
Editor:
Mariana Romanyshyn
Venues:
UNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
96–104
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.11/
DOI:
Bibkey:
Cite (ACL):
Anastasiia Senyk, Mykhailo Lukianchuk, Valentyna Robeiko, and Yurii Paniv. 2025. Context-Aware Lexical Stress Prediction and Phonemization for Ukrainian TTS Systems. In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 96–104, Vienna, Austria (online). Association for Computational Linguistics.
Cite (Informal):
Context-Aware Lexical Stress Prediction and Phonemization for Ukrainian TTS Systems (Senyk et al., UNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.11.pdf