Ayush Anand


2024

pdf
IWSLT 2024 Indic Track system description paper: Speech-to-Text Translation from English to multiple Low-Resource Indian Languages
Deepanjali Singh | Ayush Anand | Abhyuday Chaturvedi | Niyati Baliyan
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

Our Speech-to-Text (ST) translation system addresses low-resource Indian languages (Hindi, Bengali, Tamil) by combining advanced transcription and translation models for accurate and efficient translations. The key components of the system are: The Audio Processor and Transcription Module which utilizes ResembleAI for noise reduction and OpenAI’s Whisper model for transcription. The Input Module validates and preprocesses audio files. The Translation Modules integrate the Helsinki-NLP model for English to Hindi translation and Facebook’s MBart model for English to Tamil and Bengali translations, fine-tuned for better quality. The Output Module corrects syntax and removes hallucinations, delivering the final translated text. For performance evaluation purpose, SacreBLEU scores were used and attained the following values: English-to-Hindi: 24.21 (baseline: 5.23); English-to-Bengali: 16.18 (baseline: 5.86); English-to-Tamil: 10.79 (baseline: 1.9). The solution streamlines workflow from input validation to output delivery, significantly enhancing communication across different linguistic contexts and achieving substantial improvements in SacreBLEU scores. Through the creation of dedicated datasets and the development of robust models, our aim is to facilitate seamless communication and accessibility across diverse linguistic communities, ultimately promoting inclusivity and empowerment.