[b] = [d] - [t] + [p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

Kwanghee Choi, Eunjung Yeo, Cheol Jun Cho, David Harwath, David R. Mortensen


Abstract
Self-supervised speech models (S3Ms) are known to encode rich phonetic information, yet how this information is structured remains underexplored. We conduct a comprehensive study across 96 languages to analyze the underlying structure of S3M representations, with particular attention to phonological vectors.We first show that there exist linear directions within the model’s representation space that correspond to phonological features. We further demonstrate that the scale of these phonological vectors correlate to the degree of acoustic realization of their corresponding phonological features in a continuous manner. For example, the difference between [d] and [t] yields a voicing vector: adding this vector to [p] produces [b], while scaling it results in a continuum of voicing. Together, these findings indicate that S3Ms encode speech using phonologically interpretable and compositional vectors, demonstrating phonological vector arithmetic.All code and interactive demos are available at https://github.com/juice500ml/phonetic-arithmetic.
Anthology ID:
2026.findings-acl.537
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11048–11069
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.537/
DOI:
Bibkey:
Cite (ACL):
Kwanghee Choi, Eunjung Yeo, Cheol Jun Cho, David Harwath, and David R. Mortensen. 2026. [b] = [d] - [t] + [p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic. In Findings of the Association for Computational Linguistics: ACL 2026, pages 11048–11069, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
[b] = [d] - [t] + [p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic (Choi et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.537.pdf
Checklist:
 2026.findings-acl.537.checklist.pdf