emg2speech: synthesizing speech from electromyography using self-supervised speech models

Harshavardhana T Gowda, Daniel C Comstock, Lee M. Miller


Abstract
We present a neuromuscular speech interface that translates electromyographic (EMG) signals recorded from orofacial muscles during speech articulation directly into audio. We find that self-supervised speech (S3) representations are strongly linearly related to the electrical power of muscle activity: a simple linear mapping predicts EMG power from S3 representations with a correlation of *r* = 0.85. In addition, EMG power vectors associated with distinct articulatory gestures form structured, separable clusters. Together, these observations suggest that S3 models implicitly encode articulatory mechanisms, as reflected in EMG activity. Leveraging this structure, we map EMG signals into the S3 representation space and synthesize speech, enabling end-to-end EMG-to-speech generation without explicit articulatory modeling or vocoder training. We demonstrate this system with a participant with amyotrophic lateral sclerosis (ALS), converting orofacial EMG recorded while she *silently* articulated speech into audio.
Anthology ID:
2026.acl-long.750
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16490–16507
Language:
URL:
https://preview.aclanthology.org/bulk-corrections-2026-07-02/2026.acl-long.750/
DOI:
10.18653/v1/2026.acl-long.750
Bibkey:
Cite (ACL):
Harshavardhana T Gowda, Daniel C Comstock, and Lee M. Miller. 2026. emg2speech: synthesizing speech from electromyography using self-supervised speech models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16490–16507, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
emg2speech: synthesizing speech from electromyography using self-supervised speech models (Gowda et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/bulk-corrections-2026-07-02/2026.acl-long.750.pdf
Checklist:
 2026.acl-long.750.checklist.pdf