Spoken Document Retrieval for an Unwritten Language: A Case Study on Gormati

Sanjay Booshanam, Kelly Chen, Ondrej Klejch, Thomas Reitmaier, Dani Kalarikalayil Raju, Electra Wallington, Nina Markl, Jennifer Pearson, Matt Jones, Simon Robinson, Peter Bell


Abstract
Speakers of unwritten languages have the potential to benefit from speech-based automatic information retrieval systems. This paper proposes a speech embedding technique that facilitates such a system that we can be used in a zero-shot manner on the target language. After conducting development experiments on several written Indic languages, we evaluate our method on a corpus of Gormati – an unwritten language – that was previously collected in partnership with an agrarian Banjara community in Maharashtra State, India, specifically for the purposes of information retrieval. Our system achieves a Top 5 retrieval rate of 87.9% on this data, giving the hope that it may be useable by unwritten language speakers worldwide.
Anthology ID:
2025.findings-emnlp.1224
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22497–22509
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1224/
DOI:
10.18653/v1/2025.findings-emnlp.1224
Bibkey:
Cite (ACL):
Sanjay Booshanam, Kelly Chen, Ondrej Klejch, Thomas Reitmaier, Dani Kalarikalayil Raju, Electra Wallington, Nina Markl, Jennifer Pearson, Matt Jones, Simon Robinson, and Peter Bell. 2025. Spoken Document Retrieval for an Unwritten Language: A Case Study on Gormati. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22497–22509, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Spoken Document Retrieval for an Unwritten Language: A Case Study on Gormati (Booshanam et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1224.pdf
Checklist:
 2025.findings-emnlp.1224.checklist.pdf