The Development of Spectral and Temporal Encodings in Speech Sounds

Frank Lihui Tan, Youngah Do


Abstract
This study uses a modeling approach to explore the development of spectral and positional encodings in speech sounds. Humans rely on their auditory system to differentiate between individual sounds in words by analyzing both spectral properties of phonemes and their relative positions. Previous neuroscientific research has identified specific neural populations in the auditory cortex that respond to spectral processing, while behavioral studies have confirmed humans’ ability to perceive the relative positions of phonemes in speech sequences. To investigate these encodings, a Long Short-Term Memory (LSTM) autoencoder with a cross-attention mechanism trained on Mel-spectrogram transformed from raw speech data is employed in this research. By conducting ABX tests on the model’s representations at various learning stages, we observe the emergence of spectral and positional encodings. The results show that the model excels in distinguishing spectral features similar to neuroscientific findings, and also reveals independent positional encoding through accurate temporal distinctions. Furthermore, we illustrate the developmental trajectory of spectral and positional encodings during the learning process, proposing the need for further investigating their neural correlates.
Anthology ID:
2026.scil-main.11
Volume:
Proceedings of the Society for Computation in Linguistics 2026
Month:
July
Year:
2026
Address:
San Diego, CA
Editors:
Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
Venues:
SCiL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–126
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.11/
DOI:
Bibkey:
Cite (ACL):
Frank Lihui Tan and Youngah Do. 2026. The Development of Spectral and Temporal Encodings in Speech Sounds. In Proceedings of the Society for Computation in Linguistics 2026, pages 113–126, San Diego, CA. Association for Computational Linguistics.
Cite (Informal):
The Development of Spectral and Temporal Encodings in Speech Sounds (Tan & Do, SCiL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.11.pdf