Yongqiang Cheng


2025

pdf bib
Speech-Controlled Smart Speaker for Accurate, Real-Time Health and Care Record Management
Jonathan E. Carrick | Nina Dethlefs | Lisa Greaves | Venkata M. V. Gunturi | Rameez Raja Kureshi | Yongqiang Cheng
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

To help alleviate the pressures felt by care workers, we have begun new research into improving the efficiency of care plan management by advancing recent developments in automatic speech recognition. Our novel approach adapts off-the-shelf tools in a purpose-built application for the speech domain, addressing challenges of accent adaption, real-time processing and speech hallucinations. We augment the speech-recognition scope of Open AI’s Whisper model through fine-tuning, reducing word error rates (WERs) from 16.8 to 1.0 on a range of British dialects. Addressing the speech-hallucination side effect of adapting to real-time recognition by enforcing a signal-to-noise ratio threshold and audio stream checks, we achieve a WER of 5.1, compared to 14.9 with Whisper’s original model. These ongoing research efforts tackle challenges that are necessary to build the speech-control basis for a custom smart speaker system that is both accurate and timely.